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Description 

Field of the Invention 

5 This invention relates to development of novel binding proteins by an iterative process of mutagenesis, expression, 

chromatographic selection, and amplification. 

Information Disclosure Statement 

10 The amino acid sequence of a protein determines its three^iimensional (3D) structure, which in turn determines 

protein functioning (EPST63. ANFI73 ). The system of classification of protein structure of Schulz and Sch.rmer 

( SCHU79 , ch 5) is adopted herein. . 
— Thelo structure of a protein is essentially unaffected by the identity of the ammo acids at some loci; at other loc. 
only one or a few types of amino acid is allowed (SHOR8S, EISE8S. REI088). Generally, loci where wide variety is 
is allowed have the amino acid side group directed toward the soWent. While limited variety «™ J? "f 

group is directed toward other parts of the protein. (See alsoSCHU79, P 169-171 and CREJSi. P 239-245. 314-31 5). 

The secondary structure (helices, sheets, turns, loops) of a protein is determined mostly by local sequence. Certain 
amino acids tend to be correlated with certain secondary structures and the commonly used Chou-Fasman (CHOU74, 
CHOU78a CHOUTSb) rules depend on these correlations. However, every amino acid type has been observed in 
20 helices and in both parallel and antiparallel sheets. Pentapeptides of identical sequence are found in different proteins; 
in some cases the conformations of the pentapeptides are very different (KABS34. ARG087 ). 

Turns and loops tolerate insertions and deletions more readily than do other secondary structures (RICH81, 
THOR88 SUTC87a); related proteins differ most in loops and turns. 

Changing three residues in subtilisin from Bacillus amvloliouefaciens to be the same as the corresponding residues 
25 in subtilisin from B. licheniformis produced a protease that had nearly the same actrvity as the subhhsm from the latter 
organism; 82 differences remained in the sequences. The three residues changed were chosen because they were 
the onty differences within 7 Angstroms (A) of the active site (WELL87a ). mrtla „ lla<t ,q nH ii79 

Schulz and Schirmer summarize many observations on the binding of proteins to other molecules (SCHU79 . 
D98-1 05) For example, haemoglobin alpha chains bind very tightly to haemoglobin beta chains (delta G more negative 
so fhaV-^^^ 

eoual to Ml FBV fA" 81) ■ basic bovine pancreatic trypsin inhibitor (BPTI) binds tightly to trypsin (Kd = 6.0 x 10 M 
ggjff^JSio ^./mn^-^nd avidin binds to biotin (K d - 1 .3 x 10*« M (CREI84. P 362)). In each case 

\^^^nM^^Z2^oi me surfaces that come into contact: bumps fit into hdes, unlike charges 

^meSher dipoles align, and hydrophobic atoms contact other hydrophobic atoms. Although bulk water ,s excluded. 
35 SSS I waTmolecute" are frequently found filling space in intermolecular interfaces; these waters usual* form 

hvdroaen bonds to one or more atoms of the protein or to other bound water. ^, aA „ UQ . hllt 

t£ factors affecting protein binding are known. (CHOTJl CHOT76, SCHU79, 098-107, and CBEI84. CM but 

designing new complementary surfaces has proved difficult. Although some rules have been developed for substituting 

2SS5£eyiazs>. of P roteins are floppy and il is difficult t0 predict , what 

40 s de group wiTa^rther. the forcesjhat bind proteins to other molecules are all relauvely weak and tm dfficuttto 
p/eoictTe effects of these forces. Hence, it is difficult to design superior binding prote.ns based on theory alone 

Enzyme-substrate affinity, however, has fortuitously been increased by protein engineering mgM) A point mu- 
tant of Jrosyl tRNA synthetase of BacHlus stearothermoohilus exhibits a 1 00-fold increase ,n affinrty for ATF . S bsti- 
4S Sn oSamino add for anotheTaTalurface locus may profoundly after binding ^^^^^ 
substrate binding without affecting the tertiary structure of the protein. For example, ,n s.ckle«ell haemoglobin the 

Ss3 . P 1 25-1 45) ; the tertian and quaternary structure of the haemoglobin are not changed (PADL85, WISH75, 

50 JMS S.'ghg a single amino acid in BPTI greatly reduces its binding to trypsin, but some of the "^f"' 6 ^ 
th- characteristics of binding to and inhibiting chymotrypsin. while others exhibit new binding to elastase 

SE^S ^SS o singl amino acids on the surface o, the lambda Cro repressor greatly reduce «s 

Thus changing the surface of a bhdtog protein may after its specificity wrthout abolishing oind ng act ^ 
55 ^te reUy developed techniques of ^^^^^^^^Z^ SE£ 



^•Z23cESb^uS«7 andAUSyjE Mutations are general* detected by sequencing and in some 
ZX ^S ^^S^esT^d allow researchers to analyze the function of each residue in a 

o roi each base pair in a regulatory DNA sequence (CHEN88). In these analyses, the norm has been 
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to strive for the classical goal of obtaining mutants carrying a single alteration (AUSUS7). 

Reverse genetics is often applied to coding regions to determine which residues are most important to protein 
structure and function; isolation of a single mutant at each residue of the protein gives an initial estimate of which 
residues play crucial roles. 

5 Prior to the method of the present invention, two general approaches have been developed to create novel mutant 

proteins through reverse genetics. In one approach, dubbed 'protein surgery" (DILL97), a specific substitution is intro- 
duced at a single protein residue to determine the effects on structure and function of specific substitutions (CRA185) 
(RAOS87HBASH87) . However, many desirable protein alterations require multiple amino acid substitutions and thus 
are not accessible through single base changes or even through all possible amino acid substitutions at any one residue. 

10 The other approach has been randomly to generate a variety of mutants at many loci within a cloned gene using 

mutagenic chemicals or radiation. The specific location and nature of the change are determined by DNA sequencing. 
(PAKU86) This approach is limited by the number of colonies that can be examined. Also, it does not take advantage 
of any knowledge of the protein structure and its relationship to binding activity. 

Progress toward rules governing substitutions of amino acids (ULME83) has been greatly hampered by the ex- 

15 tensive efforts involved in using either method and the practical limitations on the number of colonies that can be 
inspected (ROBE86). 

The term 'saturation mutagenesis" with reference to synthetic DNA is generally taken to mean generation of a 
population in which: a) every possible single-base change within a fragment of a gene of DNA regulatory region is 
represented, and b) most mutant genes contain only one mutation. Thus a set of all possible single mutations for a 6 

20 base pair length of DNA comprises a population of 18 mutants. Oliphant et aL (OLIP86) and Oliphant and Struhl 
(OL1P87) have demonstrated ligation and cloning of highly degenerate oligonucleotides and have applied saturation 
mutagenesis to the study of promoter sequence and function. They suggest that similar methods could be used to 
study genetic expression of proteins, but they do not say how to: a) choose protein residues to vary, or b) select or 
screen mutants with desirable properties. 

25 Reidhaar-Olson and Sauer (REID88) have used synthetic degenerate oligo-nts to vary simultaneously two or three 

residues through ail twenty amino acids in the dimer interface of cl repressor from bacteriophage lambda. They give 
no discussion of the limits on how many residues could be varied at once nor do they mention the problem of unequal 
abundance of DNA encoding different amino acids. They looked for proteins that either had wild-type dimerization or 
that did not dimerize. They did not seek proteins having novel binding properties and did not report any. 

30 Several researchers have designed and synthesized proteins de novo . These designed proteins are small and 

most have been synthesized jn vitro as polypeptides rather than genetically. Gutte and colleagues have made a 
polypeptide that binds DDT in 55% ethanol (MOSE83). Recently Moser et aL (MOSE37) reported genetic expression 
in E. coli both of the designed 24 residue DDT-binding protein and of fusions of the DDT-binding sequence to LacZ. 
They state that design of biologically active proteins is currently impossible. 

35 Erickson et aL (ERIC86) have designed and synthesized a series of proteins that they have named betabellins, 

that are meant to have beta sheets. They suggest use of polypeptide synthesis with mixed reagents to produce several 
hundred analogous betabellins, and use of a column to recover analogues with high affinity for a chosen target com- 
pound bound to the column. They envision successive rounds of mixed synthesis of variant proteins and purification 
by specific binding. They do not discuss how residues should be chosen for variation. Because proteins cannot be 

40 amplified, the researchers must sequence the recovered protein to learn which substitutions improve binding. The 
researchers must limit the level of diversity so that each variety of protein will be present in sufficient quantity for the 
isolated fraction to be sequenced. 

Methods have been developed to separate cells through their affinity to various substances. Methods applied to 
animal cells reveal common problems: a) non-specific interactions between cells and affinity supports, and b) irrevers- 

45 ible binding of cells to affinity matrices (BONN85). 

Ferenci and collaborators have published a series of papers on the chromatographic isolation of mutants of the 
maltose-transport protein LamBofE. coli (WAN D79. FERE80a, FERE80b, FERESOc, FERE82a, FERE82b, FERE83, 
CLUN84, FERE86a, FERE86b, FERE86c, FERE87a, FERE87b, HEIN87, and HEIN88). The papers report that spon- 
taneous and induced mutants at the lamB genetic locus can be isolated by chromatography over a column supporting 

so immobilized maltose, maitodextrins, or starch. The reports speculate that other applications are possible, but specifi- 
cally mention only the elucidation of the residues responsible for the selectivity of the maltodextrin pore or similar pore 
proteins. The mutant proteins were non-chimeric, and no attempt was made to obtain binding to a new target. 

Both FERE86a and CLUN84 point up the difficulties of working with live bacteria that can metabolize chemicals 
ah^han^th^if^hTsio^ 

ss a fragment of a heterologous gene can be introduced into bacteriophage F1 gene Ml (SMIT85). If the inserted 

fragment preserves the original reading frame, expression of the altered gene IN causes an inserted segment to appear 
in the gene III protein. The resulting strain of fl virions are adsorbed by an antibody against the protein encoded by the 
heterologous DNA. The phage were eluted at pH 2.2 and retained some infectivity. However, the single copy of f 1 gene 
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]M was used for insertion of the heterologous gene so that ail copies of gene III protein were affected; infectivity of the 
resultant phage was reduced 25-fo(d. 

Smith presented his method as a way to isolate cloned genes using antibodies to the gene products. He made no 
mention of mutagenizing the inserted genetic material or of inducing novel binding properties in the inserted protein 
5 domain. 

A fragment of the repeat region of the circumsporozoite protein from Plasmodium falciparum has been expressed 
on the surface of M1 3 as an insert in the gene ill protein (CRUZ88). The recombinant phage were both antigenic and 
immunogenic in rabbits. The authors do not suggest mutagenesis of the inserted material. 

Gene fragments coding for hepatitis B virus epitopes have been fused to fragments of lamB , and if the fusion is 
10 in a region coding for exposed domains of LamB, the HBV epitopes appear on the cell surface and are immunogenic 
(CHAR87). Charbit etaL (CHAR87) suggest use of these engineered strains for development of alive bacterial vaccine; 
they did not suggest mutagenesis of the fused heterologous gene fragments, nor development of binding capabilities. 

Ladner, US Patent No. 4,704,692, "Computer Based System and Method for Determining and Displaying Possible 
Chemical Structures for Converting Double- or Multiple-Chain polypeptides to single-Chain Polypeptides" describes a 
is design method for converting proteins composed of two or more chains into proteins of fewer polypeptide chains, but 
with essentially the same 3D structure. There is no mention of variegated DNA and no genetic selection. Ladner and 
Bird, WO88/01649 (Publ. March 10, 1988} disclose the specific application of computerized design of linker peptides 
to the preparation of single chain antibodies. 

Ladner, Click and Bird, WO88/06630 (publ. 7 Sept. 1988) (LGB) speculate that diverse single chain antibody 
20 domains may be screened for binding to a particular antigen by varying the DNA encoding the combining determining 
regions of a single chain antibody, subcloning the SCAD gene into the gpV gene of phage lambda so that a SCAD/ 
gpV chimera is displayed on the outer surface of the phage, and selecting phage which bind to the antigen through 
affinity chromatography. The only antigen mentioned is bovine growth hormone. No other binding molecules, targets, 
carrier organisms, or outer surface proteins are discussed. Nor is there any mention of the method or degree of mu- 
2$ tagenesis. 

Ladner and Bird, WO88/06601 (publ. 7 September 1988) suggest that single chain "psuedodimeric" repressors 
(DNA-binding proteins) may be prepared by mutating a putative linker peptide followed by in vivo selection that mutation 
and selection may be used to create a dictionary of recognition elements for use in the design of asymmetric repressors. 
The repressors are not displayed on the outer surface of an organism. 
30 No admission is made that any cited reference is prior art or pertinent prior art, and the dates given are those 

appearing on the reference and may not be identical to the actual publication date. 



SUMMARY OF THE INVENTION 

This invention relates to the construction, expression, and selection of mutated genes that specify novel proteins 
with desirable binding properties, as well as these proteins themselves. The substances bound by these proteins, 
hereinafter referred to as "targets", may be, but need not be, proteins. Targets may include other biological or synthetic 
macromoiecules as weJi as organic and inorganic molecules. 

The novel binding proteins may be obtained: 1) by mutating a gene encoding a known binding protein within the 
subsequence encoding a known binding domain, or 2) by taking such a subsequence of the gene for a first protein 
and combining it with all or part of a gene for a second protein (which may or may not be itself a known binding protein), 
3) by mutating a gene encoding a protein which, while not possessing a known binding activity, possesses a secondary 
or higher structure that lends itself to binding activity (clefts, grooves, etc.). or 4) by mutating a gene encoding a known 
binding protein but not in the subsequence known to cause the binding. The protein from which the novel binding 
protein is derived need not have any specific affinity for the target material. 

In one embodiment, the invention relates to a method of obtaining a nucleic acid encoding a proteinaceous binding 
domain that binds a predetermined target material, other than the antigen combining site of an antibody which specif- 
ically binds said domain, comprising: 

so a) preparing a variegated population of amplifiable genetic packages, said genetic packages being selected from 

the group consisting of cells, spores and viruses, each said genetic package being genetically alterable and having 
an outer surface including a genetically determined outer surface protein, each package including a first nucleic 

acid construct codin g for a chimeric potential binding protein, each said chimeric protein comprising, and each 

said construct comprising DNA encoding, (i) a potential binding domain which is a mutant of a predetermined 

S5 domain of a predetermined parental protein other than a single chain antibody, comprising one or more identifiable 

surface residues, and for which both an affinity molecule and an amino acid sequence are either available or 
obtainable, and (ii) an outer surface transport signal for obtaining the display of the potential binding domain on 
the outer surface of the genetic package, the expression of which construct results in the display of said chimeric 



35 



40 



45 



5 



EP 0 436 597 B1 

potential binding protein and its potential binding domain on the outer surface of said genetic package; and wherein 
said variegated population of genetic packages collectively display a plurality of different potential binding domains, 
the differentiation among said plurality of different potential binding domains occurring through the at least partially 
random variation of one or more predetermined amino acid positions of said parental binding domain to randomly 
5 obtain at each said position an amino acid belonging to a predetermined set of two or more amino acids, the amino 

acids of said set occurring at said position in statistically predetermined expected proportions, the genetic message 
encapsulated by said genetic packages being amolifiable in vitro or by cell culture of said genetic package and 
separable on the basis of the potential binding domain displayed thereon, 

10 (b) causing the expression of said chimeric potential binding proteins and the display of said potential binding 

domains on the outer surface of said packages; 

(c) contacting said packages with the predetermined target material such that said potential binding domains and 
the target material may interact; 



is 



(d) separating packages displaying a potential binding domain that binds the target material from packages that 
do not so bind on the basis of their ability to bind with the target material in step c, and 

(e) recovering at least one package displaying on its outer surface a chimeric binding protein comprising a stable 
20 successful binding domain (SBD) which bound said target, said package comprising nucleic acid. encoding said 

successful binding domain, and amplifying said SBD-encoding nucleic acid in vivo or in vitro. 

In step (c), the method may further comprise contacting the packages with a second material and isolating packages 
which do not bind that second material. Also, after obtaining a novel binding protein recognizing a first predetermined 

25 target, the novel binding protein may be chosen as a parental potential binding protein for the isolation of a derivative 
protein which also binds to a second predetermined target 

A chimeric protein comprising (1) at least a segment of an outer surface protein of a filamentous phage, said 
segment providing an outer surface transport signal recognized by a cell infected by said phage such that the chimeric 
protein is assembled into the coat of phage particles produced by said cell, and (ii) a stable, proteinaceous binding 

30 domain, other than a single chain antibody, said domain comprising one or more identifiable surface residues, that 
binds a" predetermined target material, other than the antigen combining site of an antibody which specifically binds 
said domain, the target being bound sufficiently strongly so that the dissociation constant of the binding domain: target 
complex is less than 10* moles/liter, and that is heterologous to said phage. 

The invention encompasses the design and synthesis of variegated DNA encoding a family of potential binding 

35 . proteins characterized by constant and variable regions, said proteins being designed with a view toward obtaining a 
protein that binds a predetermined target. 

For the purposes of this invention, the term 'potential binding protein" refers to a protein encoded by one species 
of DNA molecule in a population of variegated DNA wherein the region of variation appears in one or more subse- 
quences encoding one or more segments of the polypeptide having the potential of serving as a binding domain for 

40 the target substance. 

From time to time, it may be helpful to speak of the 'parent sequence" of the variegated DNA. When the novel 
binding domain sought is an analogue of a known binding domain, the parent sequence is the sequence that encodes 
the known binding domain. The variegated DNA will be identical with this parent sequence at most loci, but will diverge 
from it at chosen loci. When a potential binding domain is designed from first principles, the parent sequence is a 
45 sequence which encodes the amino acid sequence that has been predicted to form the desired binding domain, and 
the variegated DNA is a population of "daughter DNAs" that are related to that parent by a high degree of sequence 

similarity. ^ ., 

The fundamental principle of the invention is one of forced evolution . The efficiency of the forced evolution is greatly 

enhanced by careful choice of which residues are to be varied. The 3D structure of the potential binding domain is a 
so key determinant in this choice. First a set of residues that can simultaneously contact one molecule of the target is 
identified Then all or some of the codons encoding these residues are varied simultaneously to produce a variegated 
population of DNA. The variegated population of DNA is used to transform cells so that a variegated population of 
genetic packages is produced. 
The mixed population of genetic padcagesc^in^ 

55 packages containing genes that express proteins that in fact bind to the target ("successful binding domains"). After 
one or more rounds of such enrichment, one or more of the chosen genes are examined and sequenced. If desired, 
new loci of variation are chosen. The selected daughter genes of one generation then become the parent sequences 
for the next generation of variegated DNA, beginning the next Variegation cycle." Such cycles are continued until a 
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protein with the desired target affinity is obtained. 
BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a schematic showing the relationships between various types of Binding Domains (BD). 
Figure 2 is a flow chart showing the major steps used to create a novel protein with affinity for a predetermined 
target. 

Figure 3 is a schematic of a PBD contacting a molecule of target material. 
Figure 4 is a schematic of the construction of pLG3 from M13mp18 and p8R322. 
Figure 5 is a schematic of the construction of pLG7 from pLG3 and synthetic DNA. 

DETAILED DESCRIPTION OF THE INVENTION 

Sec. 0.1: Overview: 

The present invention separates mutated genes that specify novel proteins with desirable binding properties from 
closely related genes that specify proteins with no or undesirable binding properties, by: 1) arranging that the product 
of each mutated gene be displayed on the outer surface of a replicable genetic package that contains the gene, and 
2) using affinity separation incorporating a desirable target material to enrich the population of packages for those 
packages containing genes specifying proteins with improved binding to that target material. 

Let K 0 (x,y) be a dissociation constant, 

K n ( X y) rMJyl 
[x:y] * 

For the purposes of the appended claims, a protein P is a binding protein if 

(1) for one molecular, ionic or atomic species A, the dissociation constant Kq (P,A) < 10" 6 moles/liter, and 

(2) for a different molecular, ionic or atomic species B, Kq (P,B) > 10* 1 moles/liter. 

As a result of these two conditions, the protein P exhibits specificity for A over B, and a minimum degree of affinity 
(or avidity) for A. __ _ 

When a domain of a protein is primarily responsible for the protein's ability to specifically bind a chosen target, it 
is referred to herein as a "binding domain" (BD). We engineer the appearance of a stable protein domain, denoted as 
an "initial potential binding domain" (IPBD), on the surface of a genetic package. The present invention is concerned 
with the expression of numerous, diverse, variant "potential binding domains" (PBD), all related to a "parental potential 
binding domain" (PPBD) such as the binding domain of a known binding protein, and with selection and amplification 
of the genes encoding the most successful mutant PBDs. An IPBD is chosen as PPBD to the first round of variegation. 
Selection-through-binding isolates one or more "successful binding domains" (SBD). An SBD from one round of vari- 
egation and selection-through-binding is chosen to be the PPBD for the next round. The invention is not, however, 
limited to proteins with a single BD since the method may be applied to any or all of the BDs of the protein, sequentially 
or simultaneously. The relationships of the various BDs are illustrated in Figure 1 . 

The term "variegated DNA" refers to a population of molecules that have the same base sequence through most 
of their length, but that vary at a limited number of defined loci, preferably 5-10 codons. A molecule of variegated DNA 
can be introduced into a plasmid so that it constitutes part of a gene (OLIP86, OLIP87, AUSU87, REID88). When 
plasmids containing variegated DNA are used to transform bacteria, each cell makes a version of the original protein. 
Each colony of bacteria may produce a different version from any other colony. If the variegations of the DNA are 
concentrated at loci known to be on the surface of the protein or in a loop, a population of proteins will be generated, 
many members of which will fold into roughly the same 3D structure as the parent protein. The specific binding prop- 
erties of each member, however, may be different from each other member. It remains to sort out the colonies containing 
genes for proteins with desirable binding properties from those that do not exhibit the desired affinities. 

A "single-chain antibody" is a single chain polypeptide comprising at least 200 amino acids, said amino acids 
lormingtwoantigen-b inding7egions~conne~cted~by~an^ to~b~iricT 
the antigen. Either the two antigen-binding regions must be variable domains of known antibodies, or they must (1) 
each fold into a beta barrel of nine strands that are spatially related in the same way as are the nine strands of known 
antibody variable light or heavy domains, and (2) fit together in the same way as do the variable domains of said known 
antibody. Generally speaking, this will require that, with the exception of the amino acids corresponding to the hyper- 
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variable region there is at least 38% homology with the amino acids of the variable domain of a known antibody. 

The term "affinity separation means' includes, but is not limited to: a) affinity column chromatography, b) batch 
elution from an affinity matrix material, c) batch elution from an affinity material attached to a plate, d) fluorescence 
activated cell sorting and e) electrophoresis in the presence of target material. "Affinity material' is used to mean a 
material with affinity for the material to be purified, called the 'analyte". In most cases, the association of the affinity 
material and the anaiyte is reversible so that the analyte can be freed from the affinity material once the impurities are 
washed away 

Affinity column chromatography, batch elution from an affinity matrix material held in some container, and batch 
elution from a plate are very similar and hereinafter will be treated under 'affinity chromatography.' 

Fluorescent-activated cell sorting involves use of an affinity material that is fluorescent per se or is labeled with a 
fluorescent molecule. Current commercially available cell sorters require 300 to 1000 molecules of fluorescent dye, 
such as Texas red. bound to each cell. FACS can sort 1 03 cells or viruses/sec. 

Electrophoretic affinity separation involves electrophoresis of viruses or cells in the presence of target material, 
wherein the binding of said target material changes the net charge of the virus particles or cells. It has been used to 
separate bacteriophages on the basis of charge. (SERW87). 

The present invention makes use of affinity separation of bacterial cells, or bactenal viruses (or other genetic 
packages) to enrich a population for those cells or viruses canying genes that code for proteins with desirable binding 

P 'Tn ^present invention, the words 'select' and 'selection' are used exclusively in the genetic sense; Le, a biological 
process whereby a phenotypic characteristic is used to enrich a population for those organisms displaying the desired 
phenotype. 

The process of the present invention comprises three major parts: 

I. design and production of a replicable genetic package (GP) that displays an IPSO on the surface of the GP 
denoted GP(IPBD). 

II design and implementation of an affinity separation process that separates GP(IPBD)s that bind to a known 
affinity molecule from wild-type GPs or GP(IPBD-)s, neither of which binds the known affin.ty molecule, and 

III design and implementation of a genetic variegation method, denoted structuredirected mutagenesis, wherein 
a population of 10 s or more different GP(P8D)s, denoted GPfvgPBD). is produced. 

One affinity separation is called a "separation cycle'; one pass of variegation followed by as many separation cycles 
as are needed to isolate an S8D, is called a "variegation cycle'. The amino acid sequence of one SBO from one round 
become! S iPPBO to the next variegation cycle. We perform variegation cycles iterate until the desired affinity and 
specificity of binding between an SBD and chosen target are achieved. 

Part I is a strain construction in which we deal with a single lPBD sequence, inability may be introduced into 
DNA sub fences aV-t to the ipbd subsequence and within the os^ gene so that the IPBO w, I. appear on 
the GP surface. A molecule, such as an antibody, having high affinity for correctly folded *™*^££!££ 
IPBD on the GP surface, b) screen colonies for display of IPBD on the GP surface, or c) select GPs that display IPBD 
Iron , a population, some members of which might dispfey IPBD on the GP surface. In one preferred embodiment. Part 
I of the process involves: 

1) choosing a GP such as a bacterial cel. (Sec. 1.1.1), bacterial spore (1.2.1), or phage (1.3.1). having a suitable 
outer surface protein (Sees. 1.1.3, 1.2.3, and 1.3.3). 

2) choosing a stable IPBD (Sec. 2). 

3) designing an amino acid sequence that: a) includes the IPBD as a subsequence and b) will cause the IPBO to 
appear on the GP surface (Sees. 1.1.2, 1.2.2, 1.3.2, and 4), 

4) enginearing a gene, denoted osjdEbd. that: a) codes for the designed animo acid sequence b) Provides the 
- neceSa^-ganet^ r.regulation.and^n^uces.co^eisMt sites for genetic manipulation (Sees. 4.1 , 4.2, 4.3. 5^. 

and 5.2), 



5) cloning the osp-ipbd gene into the GP (Sec. 6.1), and 

6) harvesting the transformed GPs (Sec. 7) and testing them for presence of IPBD on the GP surface (Sec. 8); 
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this test is performed with an affinity molecule having high affinity for IPBD, denoted AfM(IPBD). 
In another preferred embodiment, Part I of the process involves: 
1 ) and 2) as above 

3) designing a DNA sequence that: a) encodes the IPBD as a subsequence and b) contains suitable restriction 
sites so that random DNA may be operably linked to the i£bd gene fragment; and c) provides the necessary genetic 
regulations; this DNA sequence is called a 'display probe", (Sees. 1 . 1 .4, 1 .2.4, 1 .3.4 and 4), 

4) constructing that display probe, 

5) cloning the display probe into and amplifying it in a suitable host into the OCV, 

6) cloning random or pseudorandom DNA into one of the restriction sites provided in the display probe, (Sec. 6.2), 
whereby the random or pseudorandom DNA functions as a potential psp., and 

7) harvesting GPs (Sec. 7) screening colonies of the transformed GPs for presence of IPBD on the GP surface; 
this screening is performed with an affinity molecule having high affinity for IPBD, denoted AfM(lPBD), (Sec. 8); 
or, alternatively; 

8) selecting GPs that display IPBD by use of an affinity separation using AfM(IPBD), (Sec. 8). 

Once a GP(IPBD) is produced, it can be used many times as the starting point for developing different novel 
proteins that bind to a variety of different targets. The knowledge of how we engineer the appearance of one IPBD on 
the surface of a GP can be used to design and produce other GP(!PBD)s that display different JPBDs. 

Although Part I deals with only a single IPBD, many preparations are made for Part III where we introduce numerous 
mutations into the potential binding domain. References to PBD or^bd in Part I are to indicate a preparatory intent. 

in Part II we optimize separation of GP(IPBD) from wild-type GP, denoted wtGP, based on the affinity of IPBD for 
AfM(IPBD) and establish the sensitivity of the affinity separation process. In a preferred embodiment, Part II of the 
process of the present invention involves: 

1 ) preparing affinity columns bearing AfM(lPBD) at various densities of AfM(IPBD)/(volume of matrix). (Sec. 1 0.1 ), 

2) preparing GP(IPBD)s with various amounts of IPBD per GP. 

3) picking a gradient regime for eluting the columns (Sec. 10.1), 

4) determining which combination^: a) IPBD/GP, b) density of AfM(IPBDy (volume of support), c) initial ionic 
strength, d) elution rate, and e) (amount of GPy(volume of support) loaded, gives the best separation of GP(! PBD) 
from wtGP (Sec. 10.1), 

5) determining the smallest amount of GP(IPBD) that can be isolated from a much larger amount of wtGP using 
the optimal condition, (Sec. 10.2), and 

6) determining the efficiency of the affinity separation procedure (Sec. 10.3). 

Part II optimizes separation of a single type of GP(IPBD) from a large excess of a single different GP. The optimum 
conditions will be used in Part III to separate GP(PBD)s that bind the target from GP(PBD)s that do not 
The optimization will be at one or more specific temperatures and at one or more specific pHs. In Part III. the user mus 
specij the conditions under which the selected S8D should bind the target. If the condrtions of "^<™^ 
markedly f rom the conditions for wh ich affinity separation was optimized, the user must return to Part II and optimize 



the'affin'ity separation for conditions SffiMFt^tfiecondltlons df.ntenaeo-use-o<these,ectou-SBc,: 

?n Part m we choose a target material and a GP(IPBD) that was developed by the method of Part I and that « 
ss suitable to the target material. Using IPBD as the PPBD to the first cycle of variegation, we 

osp-obd genes that encode a wide variety of PBDs. We use an affiruty separation, developed by the , meftoe of Part 
iUo^rich the population of GP(vgPBD)s for GPs that display PBDs with b.nd.ng propert.es relative to the target that 
are superior to the binding properties of the PPBD. An SBD selected from one vanegation cycle becomes the PPBD 
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to the next variegation cycle. In a preferred embodiment, Part III of the process of the present invention involves: 

1) picking a target molecule (Sec. 11), 

2) picking a GP(IPBO) (Sec. 12), 

3) picking a set of several residues in the PPBD to vary based on a) the 30 structure of the IPBD, b) sequences 
of homologous proteins, and c) computer or theoretical modeling that indicates which residues can tolerate different 
amino acids without disrupting the underlying structure (Sec. 13.1), 

4) picking a subset of the residues to be varied simultaneously based on the number of different variants and which 
variants are within the detection capabilities of the affinity separation; (Sec. 13.2); 

5) implementing the variegation by: 

a) synthesizing the part of the osp-pbd gene that encodes the residues to be varied using a specific mixture 
of nucleotide substrates for some or all of the bases encoding residues slated for variation, thereby creating 
a population of DNA molecules, denoted vgDNA (Sec. 13.3), 

b) ligating this vgDNA, by standard methods, into the operative cloning vector (OCV) (e^, a plasmid or bac- 
teriophage) (Sec. 14.1), 

c) using the ligated DNA to transform cells, thereby producing a population of transformed cells (Sec. 14.2), 

d) culturing (le. increasing in number) the population of transformed cells and harvesting the population of 
GP(PBD)s, said population being denoted as GP(vgPBD), (Sec. 14.3), 

e) enriching the population for GPs that bind the target by using the affinity separation process developed in 
Part II, with the chosen target molecule as affinity molecule (Sec. 15), 

f) repeating steps Ill.S.d and IILS.e until a GP(SBO) having improved binding to the target is isolated (Sec. 
1 5 ) , a n d 

g) testing the isolated SBD or SBDs for affinity and specificity for the chosen target (Sec. 15.8), 

6) repeating steps III.3, III.4, and III.5 until the desired degree of binding is obtained. 

Part III is repeated for each new target material. Part I need be repeated only if no GP(IPBD) suitable to a chosen 
target is available. Part II need be repeated for each newly-developed GP(IPBD) and for previously-developed GP 
(IPBD)s if the intended conditions of use of a novel binding protein differ significantly from the conditions of previous 
optimizations. 

Sec. 0.2: Abbreviations: 

The following abbreviations will be used throughout the present invention: 
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A bbreviation Meaning 



GP Genetic Package, e.q 

bacteriophage 

X Any protein 

x The gene for protein 



-> — , - it- 
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Initial Potential Binding 
Domain, e.g. BPTI 

Potential Binding Domain, e.g. 
a derivative of BPTI 

Successful Binding Domain, 
e.g. a derivative of BPTI 
selected for binding to a 
target 

Parental Potential Binding 
Domain, i*e. an IPBD or an SBD 
from a previous selection 

Outer Surface Protein, e.g. 
coat protein of a phage or 
LamB from jcL. col j. 

Fusion of an OSP and a PBD, 
order of fusion not specified 

Outer Surface Transport Signal 

A genetic package containing 
the x gene 

A genetic package that 
displays X on its outer 
surface 

An affinity matrix supporting 
"Q M , e.g. {T4 lysozyme) is T4 
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lysozyme attached to an 
affinity matrix 



AfM(W) 



A molecule having affinity for 
"W" , e.g. trypsin is an 
AfM(BPTI) 



XINDUCE 



A chemical that can induce 
expression of a gene, e.g. 
IPTG for the lacUVS promoter 



OCV 



Operative Cloning Vector 



Krp 



iOp = [T] [SBD]/ [T: SBD] (T is a 
target) 



K N 



% = [N][SBD]/[N:SBD] (N is a 
non- target) 



DoAMoM 



Density of AfM(W) on affinity 



matrix 



Abun(x) 



Abundance of DNA molecules 
encoding amino acid x 



OMP 



Outer membrane protein 



nt 



nucleotide 



A bimolecular dissociation 
constant, Kfl = [A][B]/[A:B] 



Error level in synthesizing 
v gDNA 



Sec. 0.3: Standard sequencing method: 

The present invention is not limited to a single method of determining the sequence of nucleotides (nts) in DNA 
subsequences. Sequencing reactions, agarose gel electrophoresis, and poiyacrylamide gel electrophoresis (PAGE) 
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are performed by standard procedures (AUSU87). 

The present invention is not limited to a single method of determining protein sequences, and reference in the 
appended claims to determining the amino acid sequence of a domain is intended to include any practical method or 
combination of methods, whether direct or indirect. The preferred method, in most cases, is to determine the sequence 
of the DNA that encodes the protein and then to infer the amino acid sequence. In some cases, standard methods of 
protein-sequence determination may be needed to detect post-translational processing. 

The major steps in the process of making and isolating a novel binding protein with affinity for a chosen target 
material are illustrated in Figure 2. 

Sec. 1: Specification of Genetic Package and Means for Displaying a He terologous Binding Domain On Its Outer 
Surface: 

Sec. 1.0: General Requirements fo r Genetic Packages 

It is emphasized that the GP on which selection-through-binding will be practiced must be capable, after the se- 
lection either of growth in some suitable environment or of in vitro amplification and recovery of the encapsulated 
genetic message. During at least part of the growth, the increase in number must be approximately exponential with 
respect to time The component of a population that exhibits the desired binding properties may be quite small, for 
example one in 10 6 or less. Once this component of the population is separated from the non-binding components, it 
must be possible to amplify it. Culturing viable cells is the most powerful amplification of genetic material known and 
is preferred. Genetic messages can also be amplified in vitro, but this is not preferred. 

AGP may typically be a vegetative bacterial cell, a bacterial spore or a bacterial DNA virus. A strain of any living 
cell or virus is potentially useful if the strain can be: 

1) maintained in culture, 

2) affinity separated and retain its viability, 

3) genetically altered with reasonable facility, and 

4) manipulated to display the potential binding protein domain where it can interact with the target material during 
affin i ty separation. 

DNA encoding the IPBD sequence may be operably linked to DNA encoding at least the outer surface transport 
signal of an outer surface protein (OSP) native to the GP so that the IPBD is displayed on the outer surface of the GP. 
It should be possible to cause a genetic package to display the IPBD or PBD on its outer surface without adversely 
affecting the viability of the GP or the binding characteristics of the IPBD or PBD, if the fusion is near domain boundaries 
(BECK83 , CRAW87, TOTH86, SMIT85, MANOS6; and cf. ROSS81 , HOLL83). 

^ole characteristics of a protein that are recognized by a cell and that cause rt to be transported out of the 
cvtoplasm and displayed on the cell surface will be termed -outer-surface transport signals'. 

The replicable genetic entity (phage or plasmid) that carries the osp-pbd genes (derived from the osdhoW gene) 
through the ?se.ection-th rough-binding process, see Sec. 14, is referred to hereinafter as the operative cloning vector 
(OC V). When the OCV is a phage, it may also serve as the genetic package. The choice of a GP is dependent in part 
on the availability of a suitable OCV and suitable OSP. 

P rSI GP is readify stored, for example, by freezing. If the GP is a cell, it should have a short doubhng 
time such as 20-40 minutes. If the GP is a virus, it should be prolific, e.g., a burst size of at least 100/infected eel . 
GPs which are finicky or expensive to culture are disfavored. The GP should be easy to harvest, preferably by centr. - 
ugation. The GP is preferably stable for a temperature range of -70 to 42'C (stable at 4'C for several days or weeks); 
resistant to shear forces found in HPLC; insensitive to UV, tolerant of desiccation; and resistant to a pH of 2.0 to 10.0 
surface active agents such as SOS or Triton, chaotropes such as 4M urea or 2M guanidimum HCI. common ions such 
as K+, Na* and S0 4 - common organic solvents such as ether and acetone, and degradative enzymes. Fmalty, there 

must be a suitable OCV (see Sec. 3). «, fn ,rtiira 
Prof^y th ft * n structure of the OSP. and the sequence of the OSP gene p. 47 a re known. If the 3D structu e 
r,a y ' r . 7^ ■ . ~ ^« th a na\\ ciirfar.fi tna location of the 



is not known, there is preferabfy knowledge^ which residues are exposed on the cel. surface,™ 
domain boundaries within the OSP. and/or of successful fusions of the OSP and a fore.gn .nsert The OSP preferably 
appears in numerous copies on the outer surface of the GP, and preferably serves a non-essential funcfon. It ,s de- 
sirable that the OSP not be post translation^ processed, or at least that this process.ng be understood^ 

The preferred GP. OCV and OSP are those for which the fewest serious obstacles can be seen, rather than the 
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one that scores highest on any one criterion. 

Next, we consider general answers to the questions posed in this step for the cases of: a) vegetatively growing 
bacterial cells (Sec. 1.1), b) bacterial spores (Sec. 1.2). and c) (Sec. 1.3). Preferred OSPs for several GPs are given 
in Table 2. 

5 

Sec. 1.1 : Bacterial Cells as Genetic Packages: 

One may choose any well-characterized bacterial strain which may be grown in culture. The important questions 
in this case are: a) do we know enough about mechanisms that localize proteins on the outside of the cell, b) will the 
10 IPBD fold in the environment of the outer membrane, and c) will cells change expression ofosp-pbd, derived from osp- 
ipbd , during affinity separation? Some IPBDs may need large or insoluble prosthetic groups, such as an Fe 4 S 4 cluster, 
that are available within the cell, but not in the medium. The formation of Fe 4 S 4 clusters found in some ferrodoxins is 
catalyzed by enzymes found in the cell (BON085). IPBDs that require such prosthetic groups may fail to fold or function 
if displayed on bacterial cells. 
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Sec. 1.1.1: Preferred Bacterial Cells as GP : 



In view of the extensive knowledge of EL coli, a strain of E. coli, defective in recombination, is the strongest candidate 
as a bacterial GP. Other preferred candidates are Salmonella tvohimurium . Bacillus subtilis, and Pseudomonas aeru- 
20 qinosa . 

Sec. 1.1.2: Preferred Outer Surface Proteins for Display ing IPBDs on Bacterial Cells: 

Gram-negative bacteria have outer-membrane proteins (OMP), that form a subset of OSPs. Many OMPs span the 
25 membrane one or more times. The signals that cause OMPs to localize in the outer membrane are encoded in the 
amino acid sequence of the mature protein. Fusions of fragments of omp. genes with fragments of an x gene have led 
to X appearing on the outer membrane (BENS84, CLEM81). If no fusion data are available, then we fuse an jpbd 
fragment to various fragments of the oso. gene and obtain GPs that display the osp-iobd fusion on the cell outer surface 
by screening or selection for the display-of-IPBD phenotype. 
30 Oliver has reviewed mechanisms of protein secretion in bacteria (OL1V85 and OLIV87). Nikaido and vaara 

(NIKA87) have reviewed mechanisms by which proteins become localized to the outer membrane of Gram-negative 
hartftria. For example, the LamB protein of E, coli is synthesized with atypical signal-sequence which is subsequently 
removed. Benson et aL (BE NS84) showed thatLamB-LacZ fusion proteins would be deposited in the outer membrane 
of E. colj when residues 1^9 of the mature LamB protein are included in the fusion, but that residues 1-43 are insuf- 

35 ficient. , . 

LamB of E_ coli is a porin for maltose and maltodextrin transport, and serves as the receptor for adsorption of 

bacteriophages lambda and K10. This protein has been purified to homogeneity (ENDE78) and shown to function as 

a trimer (PALV79). Mutations to phage resistance have been used to define the parts of the LamB protein that adsorb 

each phage (ROAM80, CLEM81, CLEM83, GEHR87). 
40 Topological models have been developed that describe the function of phage receptor and maltodextrin transport. 

The models describe these domains and their locations with respect to the surfaces of the outer membrane (CLEM81 , 

CLEM83, CHAR84, HE1N88). 

LamB is transported to the outer membrane if a functional N-terminal sequence is present; further, the first 49 
amino acids of the mature sequence are required for successful transport (BENS84). Homology between parts of LamB 
45 protein and other outer membrane proteins OmpC, OmpF and PhoE has been detected (NIKA84), including homology 
between LamB amino acids 39-49 and sequences of the other proteins. These subsequences may label the proteins 
for transport to the outer membrane. Further, monoclonal antibodies derived from mice immunized with purified LamB, 
have been used to characterize four distinct topological and functional regions, two of which are concerned with maltose 
transport (GABA82). 



so 



Sec. 113 Choice of Insertion site for IPBD in Bacterial Cell OSP: 



For fusions of the phoA into the coding sequence for an integral membrane protein, the PhoA domain is localized 
" "according to where in the iSS^mfin^ 

55 phoA is inserted after an amino acid which normally is found in the cytoplasm, then PhoA appears in the cytoplasm^ 
F ^oA is inserted after an amino acid normally found in the periplasm, however, then the PhoA domain is located 
onlnTperiplasmic side of the membrane, and anchored in it. Beckwith and colleagues (BECK88) have extended these 
observations to the lacZ gene that can be inserted into genes for integral membrane proteins such that the LacZ domain 
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appears in either the cytoplasm or the periplasm according to where the iacZ gene was inserted. 

OSP-IPBD fusion proteins need not fill a structural role in the outer membranes of Gram-negative bacteria because 
parts of the outer membranes are not highly ordered. For large OSPs there is likely to be one or more sites at which 
osp can be truncated and fused to ipbd such that cells expressing the fusion will display IPBOs on the cell surface. If 
5 fusions between fragments of osg and x have been shown to display X on the cell surface, we can design an osp-ipbd 
gene by substituting ipbd forx in the DNA sequence. Otherwise, successful OMP-IPBD fusion is preferably sought by 
fusing fragments of the best omp to an ipbd, expressing the fused gene, and testing the resultant GPs for display-of- 
IPBD phenotype. We use the available data about OMP to pick the point or points of fusion between omp and ipbd to 
maximize the likelihood that IPBD will be displayed. Alternatively, we truncate osp at several sites or in a manner that 
10 produces osp fragments of variable length and fuse the osp fragments to ipbd ; cells expressing the fusion are screened 
or selected which display IPBDs on the cell surface. An additional alternative is to include short segments of random 
DNA in the fusion of omp fragments to ipbd and then screen or select the resulting variegated population for members 
exhibiting the display-of-IPBD phenotype. 

The promoter for the osp-ipbd gene, preferably, is subject to regulation by a small chemical inducer, such as 
is isopropyl thiogalactoside (IPTG) (lac UV5 promoter). It need not come from a natural oso gene; any regulatable bacterial 
promoter can be used (MANI82). 

Once a genetic packaging system employing vegetative bacterial cells has been designed, it is time to choose an 
IPBD (Sec. 2). 

20 Sec. 1.1.4: In Vivo Selection-for Pseudo-osp Gene From Random DNA Inserts in Bacterial Cells: 

As an alternative to choosing a natural OSP and an insertion site in the OSP, we can construct a gene comprising: 
a) a regulatable promoter (e.g. lacUV5 ), b) a Shine-Dalgarno sequence, c) a periplasmic transport signal sequence, 
d) a fusion of the ipbd gene with a segment of random DNA (as in Kaiser et aL (KAIS37)), e) a stop codon, and f) a 

2S transcriptional terminator. The random DNA, which preferably comprises 90-300 bases, encode numerous potential 
OSTS. (EF. KA1S87) The fusion of ipbd and the random DNA could be in either order, but ipbd upstream is slightly 
preferred. Isolates from the population generated in this way can be screened for display of the IPBD. Preferably, a 
version of selection-through-binding is used to select GPs that display IPBD on the GP surface, and thus contain a 
DNA insert encoding a functional OSTS. Alternatively, clonal isolates of GPs may be screened for the display-oMPBD 

30 phenotype. 

The preference for ipbd upstream of the random DNA arises from consideration of the manner in which the suc- 

cessful GP(IPBD) will be used. In Part III, we will introduce numerous mutations into the pbd region of the osp-pbd 

gene, some of which might include gratuitous stop codons. If pbd precedes the random DNA, then gratuitous stop 
codons in pbd lead to no OSP-PBD protein appearing on the cell surface. If pbd follows the random DNA, then gratuitous 
35 stop codons in pbd might lead to incomplete OSP-PBD proteins appearing on the cell surface. Incomplete proteins 
often are non-specrfically sticky so that GPs displaying incomplete PBDs are easily removed from the population. 

Sec. 1.2: Displaying IPBD on bacterial spores: 

40 Bacterial spores have desirable properties as GP candidates. Bacillus spores neither actively metabolize nor alter 

the proteins on their surface. However, spores are much more resistant than vegetative bacterial cells or phage to 
chemical and physical agents. Spores have the disadvantage that the molecular mechanisms that trigger sporulation 
are less well worked out than is the formation of M13 or the export of protein to the outer membrane of E. coli. 

4S Sec. 1.2.1.: Preferred Bacterial Spores for Use as GPs: 

8acteria of the genus Bacillus form endospores that are extremely resistant to damage by heat, radiation, desic- 
cation, and toxic chemicals (reviewed by Losick et aL (LOSI86)). These spores have complex structure and morpho- 
genesis that is species-specific and only partially elucidated. The following observations are relevant to the use of 

so Bacillus spores as genetic packages. 

Plasmid DNA is commonly included in spores. Plasmid encoded proteins have been observed on the surface of 
Bacillus spores (DEBR86). Sporulation involves complex temporal regulation that is moderately well understood 
(LOSI86). The sequences of several sporulation promoters are known; coding sequences operatively linked to such 
promoters are expressed only during sporulation (RA?C87)~ 

& Donovan et aL have identified several polypeptide components of B. subtilis spore coat (DON087); the sequences 

of two complete coat proteins and amino-terminal fragments of two others have been determined. Some components 
of the spore are synthesized in the forespore, e.g. small acid-soluble spore proteins (ERRI88), while other components 
are synthesized in the mother cell and appear in the spore (e.g. the coat proteins). This spatial organization of synthesis 
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is controlled at the transcriptional level. 

Spores self-assemble, but the signals that cause various proteins to localize in different parts of the spore are not 
well understood; presumably, the signals controlling deposition of the coat proteins from the cytoplasm of the mother 
cell onto the spore coat are embedded in the polypeptide sequence. Some, but not all, of the coat proteins are syn- 
thesized as precursors and are then processed by specific proteases before deposition in the spore coat (DON087). 
Viable spores that differ only slightly from wild-type are produced in 8. subtilis even if any one of four coat proteins is 
missing (DON087). Disulfide bonds form within the spore (thiol reducing agents are needed to solubilize several of 
the proteins of the coat). The 12kd coat protein, CotO, contains 5 cysteines. CotD also contains an unusually high 
number of histidines (16) and prolines (7). The llkd coat protein. CotC, contains only one cysteine and one methionine. 
CotC has a very unusual amino-acid sequence with 19 lysines (K) appearing as 9 K-K dipeptides and one isolated K. 
There are also 20 tyrosines (Y) of which 1 0 appear as 5 Y-Y dipeptides. Peptides rich in Y and K are known to become 
crosslinked in oxidizing environments (0EVO78, WAIT83, WAIT36). CotC contains 1 6 D and E amino acids that nearly 
equals the 19 Ks. There are no A. F. R, I, L. N. P, Q. S. or W amino acids in CotC. Neither CotC nor CotO is post- 
translationally cleaved. The proteins CotA and CotB are post-translationally cleaved. 

Endospores from the oenus Bacillus are more stable than are exospores from Streptomyces. Bacillus subtilis forms 
spores in 4 to 6 hours but Streptomyces species may require days or weeks to sporulate. In addition, genetic knowledge 
and manipulation is much more developed for a subtilis than for other spore-forming bacteria. Thus BaciMus spores 
are preferred over Streptomyces spores. Bacteria of the genus Clostridium also form very durable endospores, but 
Clostridia being strict anaerobes, are not convenient to culture. The choice of a species of BaciNus is governed by 
knowledge and availability of cloning systems and by how easily sporulation can be controlled. A particular strain is 
chosen by the criteria listed in Sec. 1.0. Many vegetative biochemical pathways are shut down when spoliation begins 
so that prosthetic groups might not be available. 

Sec. 1.2.2 Preferred outer-surface proteins for Displavino IPBD on Bacterial Spores: 



If a spore is chosen as GP, the promoter is the most important part of the oso gene, because the promoter of a 
spore coat protein is most active: a) when spore coat protein is being synthesized and deposited onto the spore and 
b) in the specific place that spore coat proteins are being made. In 8. subtilis, some of the spore coat proteins are post- 
translationally processed by specific proteases. It is valuable to know the sequences of precursors and mature coat 
proteins so that we can avoid incorporating the recognition sequence of the specific protease into our construction of 
an OSP-IPBD fusion The sequence of a mature spore coat protein contains information that causes the prote.n to be 
deposited in tho oporo coat; thus^ene fusions that include some or all of a mature coat protein sequence are preferred 



for screening or selection for the display-of-IPBO phenotype. 

Fusions of ipbd fragments to cotC or cotD fragments are likely to cause IPBD to appear on the spore surface. The 
qenes cotC and cotD are preferred osp. genes because CotC and CotD are not post-translationally cleaved. Subse- 
ouenceltom cotATr cotB could also be used to cause an IPBO to appear on the surface of a subtiHs spores, but we 
must take th extranational cleavage of these proteins into account. DNA encoding IPBD could be fused to a 
fragment of cotA or cotB at either end of the coding region or at sites interior to the coding region. Spores could then 
be screened or selected for the display-of-IPBD phenotype. _ 

To date no Bacillus sporulation promoter has been shown to be inducible by an exogenous chemical inducer as 
the lac promoter"c7E7i!i. Nevertheless, the quantrty of protein produced from a sporulation promoter can be controlled 
by other factors, such as the DNA sequence around the Shine-Dalgarno sequence or ccdon usage. 

Sec. 1.2.3: Choice of Insertion site for IPB D in OSP of Bacterial Spore; 

The considerations governing insertion site in the spore OSP are the same as those given in Section 1 .1 .3. 
Sec. 1.2.4: In Vivo Selection for Pseudo-oso Genes From Rand om DNA Inserts in Bacterial Spores: 

Although the considerations for spores are nearly identical to the considerations for vegetative bacterial cells (Sec 
1 .1), the available information on the mechanisms that cause proteins to appear on spores is meager so that use of 
the random-DNA aoo roach becomes a more attractive option. 

!^We c^ use the aporoach described above at 1.1.4 for attaching an IPBD to an E. cojj cell, except that a) a_ 
sporulation promoter is used, and b) no periplasmic signal sequence should be present 
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Sec. 1.3: Displaying IP8D on Outer Surface of Phages: 
Sec. 1.3.1: Preferred Phages for Use as GPs: 

Unlike bacterial cells and spores, choice of a phage depends strongly on knowledge of the 3D structure of an OSP 
and how it interacts with other proteins in the capsid. The size of the phage genome and the packaging mechanism 
are also important because the phage genome itself is the cloning vector. The oso-iocd gene must be inserted into the 
phage genome; therefore: 

1) the virion must be capable of accepting the insertion or substitution of genetic material, and 

2) the genome of the phage must be small enough to allow convenient manipulation. 

Additional considerations in choosing phage are: 1 ) the morphogenetic pathway of the phage determines the environ- 
ment in which the IPBD will have opportunity to fold, 2) IPBDs containing essential disulfides may not fold within a cell, 
3) IPBDs needing large or insoluble prosthetic groups may not fold if secreted because the prosthetic group is lacking, 
and 4) when variegation is introduced in Part Ml, multiple infections could generate hybrid GPs that carry the gene for 
one PBD but have at least some copies of a different PBD on their surfaces; it is preferable to minimize this possibility. 

Bacteriophages are excellent candidates for GPs because there is little or no enzymatic activity associated with 
intact mature phage, and because the genes are inactive outside a bacterial host, rendering the mature phage particles 
metabolically inert. The filamentous phage M13 and bacteriophage PhiX174 are of particular interest. 



Filamentous phage : 

The entire life cycle of the filamentous phage M13, a common cloning and sequencing vector, is well understood. 
M1 3 and fl are so closely related that we consider the properties of each relevant to both (RASC86) ; any differentiation 
is for historical accuracy. The genetic structure (the complete sequence (SCHA78), the identity and function of the ten 
genes and the order of transcription and location of the promoters) of M1 3 is well known as is the physical structure 
of the 'virion (BANN81, BOEK80, CHAN79, ITOK79, KAPL78, KUHN85b, KUHN37, MAKO80, MARV78, MESS73, 
OHKA81, RASC86. RUSS81, SCHA78, SMIT85, WEBS78, and Z1MM82); see RASC86 for a recent review of the 
structure and function of the coat proteins. 

R elevant facts about M 13 are dis closed in Example I. 



Bacteriophage PhiX174 : 



The bacteriophage PhiX174 is a very small icosahedral virus which has been thoroughly studied by genetics, 
biochemistry, and electron microscopy (Seejhe *ingig-stranded DNA Phages (DENH78)). To date, no proteins from 
PhiX1 74 have been studied by X-ray diffraction. PhiX1 74 is not used as a cloning vector because PhiX1 74 can accept 
almost no additional DNA; the virus is so tightly constrained that several of its genes overlap. Chambers etaL (CH AM82) 
showed that mutants in gene G are rescued by the wild-type G gene carried on a plasmid so that the host supplies 



this protein. _ , . , v _ . ,. . . 

Three gene products of PhiX1 74 are present on the outside of the mature virion: F (capsid), G (major spike protein, 
60 copies per virion), and H (minor spike protein, 12 copies per virion). The G protein comprises 175 amino acids, 
while H comprises 328 amino acids. The F protein interacts with the single-stranded DNA of the virus. The proteins F, 
G, and H are translated from a single mRNA in the viral infected cells. 



Large DNA Phages 



Phage such as lambda or T4 have much larger genomes than do M1 3 or PhiX174. Large genomes are less con- 
veniently manipulated than small genomes. A phage with a large genome, however, could be used if genetic manip- 
ulation is sufficiently convenient. Phage such as lambda and T4 have more complicated 3D capsid structures than 
M13 or PhiX174, with more OSPs to choose from. Phage lambda virions and phage T4 virions form intracellular^, so 
that IPBDs requiring large or insoluble prosthetic groups might fold on t he surfaces of these phage. Phage lambda and 
phage T7ar7norprefe"rred, however, derivative^ of these phages could bWdnstructed to-overcomethesedisadvan- 



tages. 
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RNA Phages 

RNA phage, such as Qbeta, are not preferred because manipulation of RNA is much less convenient than is the 
manipulation of DNA. Although competent RNA bacteriophage are not preferred, useful genetically altered RNA-con- 
s taining particles could be derived from RNA phage, such as MS2. 

To use MS2 as a GP, we would need to eliminate most of the natural viral genome so that an osp-ipbd gene could 
fit into the protein capsid. It is known that the A protein binds sequence-specifically to a site at the 5' end of the + RNA 
strand triggering formation of RNA-containing particles if coat protein is present. If a message containing the A protein 
binding site and the gene for a chimera of coat protein and a PBD were produced in a cell that also contained A protein 
10 and wild-type coat protein (both produced from regulated genes on a plasmid), then the RNA coding for the chimeric 
protein would get packaged. A package comprising RNA encapsulated by proteins encoded by that RNA satisfies the 
major criterion that the genetic message inside the package specifies something on the outside. The particles by 
themselves are not viable. After isolating the packages that carry an SBD, we would need to: 

is 1 ) separate the RNA from the protein capsid, 

2) reverse transcribe the RNA into DNA, using AMV or MMTV reverse transcriptase, and 

3) amplify the DNA by several cycles of polymerase chain reaction (PCR) until there is enough to subclone the 
20 recovered genetic message into a plasmid for sequencing and further work. 

Alternatively, helper phage could be used to rescue the isolated phage. 

Sec. 1.3.2: Preferred Outer-Surface Proteins for Displaying IPBDs on Phages: 

25 

For a given bacteriophage, the preferred OSP is usually one that is present on the phage surface in the largest 
number of copies, as this allows the greatest flexibility in varying the ratio of OSP-IPBD to wild type OSP and also 
gives the highest likelihood of obtaining satisfactory affinity separation. Moreover, a protein present in only one or a 
few copies usually performs an essential function in morphogenesis or infection; mutating such a protein by addition 
30 or insertion is likely to result in reduction in viability of the GP. 

It is preferred that the wild-type osg gene be preserved. The ipbd gene fragment may be inserted either into a 
second copy of the recipient osp gene or into a novel engineered osp gene. The preferred OSP for use when the GP 
is Ml 3 is the gene III protein (see bxampie l). 

35 Sec. 1.3.3: Choice of Insertion site for IPBD in OSP: 

The user must choose a site in the candidate OSP gene for inserting a ipbd gene fragment. The coats of most 
bacteriophage are highly ordered. Thus in bacteriophage, unlike the cases of bacteria and spores, it is important to 
retain most or all of the residues of the parental OSP in engineered OSP-IPBD fusion proteins. A preferred site for 
insertion of the ipbd gene into the phagq»osj) gene is one in which: a) the IPBD folds into its original shape, b) the OSP 
domains fold into their original shapes, and c) there is no interference between the two domains. 

If there is a 3D model of the phage that indicates that either the amino or carboxy terminus of an OSP is exposed 
to solvent, then the exposed terminus of that mature OSP becomes the prime candidate for insertion of the inpd gene. 
A low resolution 3D model suffices. 
45 in the absence of a 3D structure, the amino and carboxy termini of the mature OSP are the best candidates for 

insertion of the ipbd gene. A functional fusion may require additional residues between the IPBD and OSP domains 
to avoid unwanted interactions between the domains. Random-sequence DNA or DNA coding for a specific sequence 
of a protein homologous to the IPBD or OSP, can be inserted between the osp fragment and the ipbd fragment if needed. 
Fusion at a domain boundary within the OSP is also a good approach for obtaining a functional fusion. 
50 There are several methods of identifying domains. Methods that rely on atomic coordinates have been reviewed 

by Janin and Chothia (JANI85) see also ROSE85, RASH84, VITA84, PAB079, POTE83, and SCOT87. 

If the only structural information available is the amino acid sequence of the candidate OSP, we use the sequence 
to predict turns and loops. There is a high probability that some of the loops and turns will be correctly predicted (cf. 
Chou-and-Fasmanr(CHGU72yV:-these : lo^ 

55 

Sec. 1 .3.4: In Vivo Selection for Pseudo-QSP Gene from Random DNA Inserts in Bacterial Spores: 

Alternatively, a functional insertion site may be determined by generating a number of recombinant constructions 
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and selecting the functional strain by phenotypic characteristics. Because the OSP-IPBD must fulfill a structural role 
in the phage coat! it is unlikely that any particular random DNA sequence coupled to the icbd gene will produce a fusion 
protein that fits into the coat in a functional way. Nevertheless, random DNA inserted between large fragments of a 
coat protein gene and the jpbd gene will produce a population that is likely to contain one or more members that display 
5 the IPBD on the outside of a viable phage. A display probe, similar to that defined in 1 . t .4, is constructed and random 
DNA sequences cloned into appropriate sites. 



Sec. 2: Choice of IPBD : 

10 An IPBD may be chosen from naturally occurring proteins or domains of naturally occurring proteins, or may be 

designed from first principles. A designed protein may have advantages over natural proteins if: a) the designed protein 
is more stable, b) the designed protein is smaller, and c) the charge distribution of the designed protein can be specified 
more freely. 

A candidate I PBD must meet the following criteria: 1 ) stablility under the conditions of its intended use (the domain 
is may comprise the entire protein that will be inserted,^ BPTI), 2) knowledge of the amino acid sequence is obtainable, 
3) identification of the residues on the outer surface, and their spatial relationships, and 4) availability of a molecule, 
AfM(IPBD) having high specific affinity for the IPBD. 

Preferably, the IPBD is no larger than necessary because it is easier to arrange restriction sites in smaller amino- 
acid sequences. The usefulness of candidate IPBDs that meet all of these requirements depends on the availability 

20 of the information discussed below. 

Information used to judge IPBD suitability includes: 1) a 3D structure (knowledge strongly preferred), 2) one or 
more sequences homologous to the IPBD (the more homologous sequences known, the better), 3) the pi of the IPBD 
(knowledge necessary in some cases), 4) the stability and solubility as a function of temperature, pH and ionic strength 
(preferably known to be stable over a wide range and soluble in conditions of intended use), 5) ability to bind metal 

25 ions such as Ca ++ or Mg+ + (knowledge preferred; binding per se, no preference), 6) enzymatic activities, if any (knowl- 
edge preferred, activity per se has uses but may cause problems), 7) binding properties, if any (knowledge preferred, 
specific binding also preferred), 8) availability of a molecule having specific and strong affinity (K d < 10/ 11 M) for the 
IPBD (preferred), 9) availability of a molecule having specific and medium affinity (10* 6 M < K d < 1CH 3 M) for the IPBD 
(preferred), 10) the sequence of a mutant of IPBD that does not bind to the affinity molecule(s) (preferred), and 11) 

30 absorption spectrum in visible, UV, NMR, etc. (characteristic absorption preferred). 

If only one species of molecule having affinity for IPBD (AfM(IPBD)) is available, it will be used to: a) detect the 
IPRn nn rha GP surface, b) optimize expression level and density of the affinity molecule on the matrix (Sec. 10.1). 



and c) determine the efficiency and sensitivity of the affinity separation (Sees. 1 0.2 and 1 0.3). As noted above, however, 
one would prefer to have available two species of AfM(iPBD), one with high and one with moderate affinity for the 
35 IPBD. The species with high affinity would be used in initial detection and in determining efficiency and sensitivity (10.2 
and 10.3), and the species with moderate affinity would be used in optimization (10.1). 

For at least 20 candidate IPBOs the above information is available or is practical to obtain, for example, bovine 
pancreatic trypsin inhibitor (BPTI, 58 residues), crambin (46 residues), third domain of ovomucoid (56 residues). T4 
lysozyme (164 residues), and azurin (128 residues). 
40 * Most of the PBDs derived from a PPBD according to the process of the present invention affect residues having 
side groups directed toward the solvent. Exposed residues can accept a wide range of amino acids, while buried 
residues are more limited in this regard (REID88). Surface mutations typically have only small effects on melting tem- 
perature of the PBD, but may reduce the stability of the PBD. Hence the chosen IPBD should have a high melting 
temperature (60°C acceptable, the higher the better) and be stable over a wide pH range (8.0 to 3.0 acceptable; 11 .0 
45 to 2 0 preferred), so that the SBDs derived from the chosen IPBD by mutation and selection-through-binding will retain 
sufficient stability. Preferably, the substitutions in the IPBD yielding the various PBDs do not reduce the melting point 
of the domain below 50°C. 

Two general characteristics of the target molecule, size and charge, make certain classes of IPBDs more likely 
than other classes to yield derivatives that will bind specifically to the target. Because these are very general charac- 

so teristics, one can divide all targets into six classes: a) large positive, b) large neutral, c) large negative, d) small positiye, 
e) small neutral, and f) small negative. A small collection of I PBDs, one or a few corresponding to each class of target, 
will contain a preferred candidate IPBD for any chosen target. 

Alternatively, the user may elect to e ngin eer a GP(I PBD) for a particula r target; See 2.1 gives criter i a that relate 

target size and charge to the choice of IPBD. 
Sec. 2.1 : Influence of target size on choice of IPBD: 

If the target is a protein or other macromolecule a preferred embodiment of the IPBD is a small protein such as 



55 
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BPTI from Bos taurus (58 residues), crambin from rape seed (46 residues), or the third domain of ovomucoid from 
Coturnix cotumix Japonica (Japanese quail) (56 residues) (PAPA82), because targets from this class have clefts and 
grooves that can accommodate small proteins in highly specific ways. If the target is a macromolecule lacking a compact 
structure, such as starch, it should be treated as if it were a small molecule. Extended macromolecules with defined 

5 3D structure, such as collagen, should be treated as large molecules. 

If the target is a small molecule, such as a steroid, a preferred embodiment of the IPBO is a protein the size of 
ribonuclease from Bos taurus (124 residues), ribonuclease from Aspergillus oryzae (104 residues), hen egg white 
lysozyme from Gallus qallus (129 residues), azurin from Pseudomonas aeruginosa (128 residues), or T4 lysozyme 
(164 residues), because such proteins have clefts and grooves into which the small target molecules can fit. The 

10 Brookhaven Protein Data Bank contains 3D structures for these proteins. Genes encoding proteins as large as T4 
lysozyme can be manipulated by standard techniques for the purposes of this invention. 

If the target is a mineral, insoluble in water, one must consider the nature of the mineral's molecular surface. 
Smooth surfaces, (such as crystalline silicon) require medium to large proteins (such as ribonuclease) as IPBD in order 
to have sufficient contact area and specificity. Rough, grooved surfaces (zeolites), could be bound either by small 

is proteins (BPTI) or larger proteins (T4 lysozyme). 

The target material may for example be selected from a non-macromolecular organic compound, in which case 
the potential binding domains may comprise greater than about 80 amino acid residues, and a macromolecular organic 
compound, in which case the potential binding domains may have less than about 80 amino acid residues. 

20 Sec. 2.2: Influence of target-charge on choice of IPBD: 

Electrostatic repulsion between molecules of like charge can prevent molecules with highly complementary sur- 
faces from binding. Therefore, it is preferred that, under the conditions of intended use, the IPBD and the target molecule 
either have opposite charge or that one of them is neutral. Inclusion of counter ions can reduce or eliminate electrostatic 
25 repulsion. 

Sec. 2.3: Other aspects of choice of IPBD: 

If the chosen IPBD is an enzyme, it may be necessary to change one or more residues in the active site to inactivate 
30 enzyme function. For example, if the IPBD were T4 lysozyme and the GP were E. coli cells or M1 3, we would inactivate 
the lysozyme lest it lyse the cells. If, on the other hand, the GP were PhiXl74, then inactivation of lysozyme may not 
be needed because T4 lysozyme can be overproduced inside E. coli cells without detrimental effects and PhiX174 
forms intracellular^. It is preferred to inactivate enzyme IPBDs that might be harmful to the GP or its host by substituting 
mutant amino acids at one or more residues of the active site. It is permitted to vary one or more of the residues that 
35 were changed to abolish the original enzymatic activity of the IPBD. Those GPs that receive osp-pbd genes encoding 
an active enzyme may die, but the majority of sequences will not be deleterious. 

Sec. 3: Choice of OCV: 

40 The OCV is preferably small, e.g.,4ess than 10 KB. It is desirable that cassette mutagenesis be practical in the 

OCV; preferably, at least 25 restriction enzymes are available that do not cut the OCV. It is likewise desirable that 
single-stranded mutagenesis be practical. Finally, the OCV preferably carries a selectable marker. A suitable OCV is 
obtained or is engineered by manipulation of available vectors. Plasmids are preferred over the bacterial chromosome 
because genes on plasmids are much more easily constructed and mutated than are chromosomal genes. When 

45 bacteriophage are to be used, the osp-ipbd gene must be inserted into the phage genome. 

For phage such as M13, an antibiotic resistance gene is engineered into the genome (HINE80). More virulent 
phage, such as PhiX1 74, make discernable plaques that can be picked, in which case a resistance gene is not essential; 
furthermore, there is no room in the PhiX174 virion to add any new genetic material. Inability to include an antibiotic 
resistance gene is a disadvantage because it limits the number of GPs that can be screened. 

so it is preferred that GP(IPBD) carry a selectable marker not carried by WtGP It is also preferred that wtGP carry a 

selectable marker not carried by GP(IPBD). 

Sec. 4: Designing the osp-ipbd gene insert; . 

* 

55 We design an amino acid sequence that will cause the IPBD to appear on the GP surface when it is expressed. 

This amino acid sequence may determine the entire coding region of the osp-ipbd gene, or it may contain only the ipbd 
sequence adjoining restriction sites into which random DNA will be cloned (Sec. 6.2). 

The actual gene may be produced by any means. The £bd segment, derived from the ipbd segment, must be 
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easily genetically manipulated in the ways described in Part 111. Synthetic ipbd segments are preferred because they 
allow greatest control over placement of restriction sites. 

Sec. 4.1 Genetic regulation of the osp-ipbd gene: 

Regarding regulation of the osp-ipbd gene, the two important questions are: a) how much OSP-IPBD do we need 
on each GP, and b) how accurately must we regulate the amount? 

The essential function of the affinity separation is to separate GPs that bear PBDs (derived from IPBD) having 
high affinity for the target from GPs bearing PBDs having low affinity for the target. If a gradient of some solute, such 
as increasing salt, changes the conditions, then all weakly-binding PBOs will cease to bind before any strongly-binding 
PBDs cease to bind. Regulation of the osp-pbd gene must be such that all packages display sufficient P8D to effect 
a good separation in See 15. If the amount of PBD/GP had an effect on the elution volume of the GP from the affinity 
matrix, then we would need to regulate the amount of PBD/GP accurately. The following analysis shows that there is 
no strong linear effect of IPBD/GP on elution volume and assumes only: a) that all GPs are the same size, b) that 
interactions between the PBDs and the affinity matrix dominate differential elution of GPs, c) that the system is at 
equilibrium, and d) that all PBDs on any one GP are identical. 

If N p identical PBDs on a GP each have access to target molecules, and each PBD has a free-energy of binding 
to the target of delta G b , then the total free energy of binding is 

delta G b tot = N p * delta G b . 

Delta G b is a function of parameters of the solvent, such as: 1) concentration of ions, 2) pH, 3) temperature, 4) con- 
centration of neutral solutes such as sucrose, glucose, ethanol, etc., 5) specific ions, such as, calcium, acetate, ben- 
zoate, nicotinate, etc. If conditions are altered during affinity separation so that delta G b approaches zero, delta G b to1 
approaches zero Np times faster. As delta G b tot goes to or above zero, the packages will dissociate from the immobilized 
target molecules and be eluted. 

GPs bearing more PBDs have a sharper transition between bound and unbound than packages with fewer of the 
same PBDs. For equilibrium conditions, the midpoint of the transition is determined only by the solution conditions that 
bring the individual interactions to zero free-energy. The number of PBDs/GP determines the sharpness of the transi- 
tion. 

It should also be noted that the number of PBDs/GP is usually influenced by physiological conditions so that a 
sample of genetically identical GP(PBD)s may contain GPs having different numbers of PBDs on the GP surface. In 
a population of GP(vgPBD)s each PBD sequence will appear on more that one GP, and the actual number of PBDs/ 
GP will vary from GP to GP within some range. Within a variegated population of PBDs, let PBD X be the PBD with 
maximum affinity for the target. If there is a linear effect on elution volume of number of PBDs/GP, then the GPs having 
the greatest number of PBD X will be most retarded on the column. When we culture the enriched population the GP 
(PBD X ) will be amplified and give rise to new GP(PBD x )s having varying numbers of PBDj/GP. Thus the affinity sepa- 
ration process of the present invention could tolerate a linear effect of number of PBDs/GP on the elution volume of 
the GP(PBD) unless strong binding to tafget fortuitously causes the PBD to be displayed on the GP only in low number. 

Since there is no linear effect on elution volume from the number of IPBDs/GP, need for highly accurate regulation 
of IPBD/GP is not anticipated. Reproducible gene expression is more easily controlled using regulated rather than 
constitutive genetic elements. The analysis above assumes that GP(IPBD)s are in equilibrium between solution in 
buffer and bound to the affinity matrix Rate of elution may be an important parameter in column affinity chromatography. 
In batch elution from an affinity matrix or elution from an affinity plate, the time that each buffer is in contact with the 
affinity material may be an important variable. The density of affinity molecules on the matrix is an important variable 
in optimizing the affinity separation. Because the analysis above is qualitative, in Sec. 10 of the preferred embodiment 
we experimentally optimize: 1 ) the density of IPBD on the GP surface, 2) the density of affinity molecules on the affinity 
matrix. 3) the initial ionic strength, 4) the elution rate, and 5) the quantity of GP/(volume of matrix) to be loaded on the 
column. 

Transcriptional regulation of gene expression is best understood and most effective, so we focus our attention on 
the promoter. A number of promoters are known that can be controlled by specific chemicals added to the culture 
medium. For example, the lacUVS promoter is induced if isopropylthiogalactoside is added to the culture medium, for 
examplerat"between-l7G"uM - and"TG:0"mMrH^^ — 
induces expression of a gene. If transcription of the osp-ipbd gene is controlled by XINDUCE, then the number of OSP- 
IPBDs per GP increases for increasing concentrations of XINDUCE until a fall-off in the number of viable packages is 
observed or until sufficient IPBD is observed on the surface of harvested GP(IPBD)s. 

The attributes that affect the maximum number of OSP-IPBDs per GP are primarily structural in nature. There may 
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be steric hindrance or other unwanted interactions between IPBDs if OSP-IPBD is substituted for every wild-type OSP 
Excessive levels of OSP-IPBD may also adversely affect the solubility or morphogenesis of the GR For cellular and 
viral GPs, as few as five copies of a protein having affinity for another immobilized molecule have resulted in successf u I 
affinity separations (FERE82a, FERE82b, and SMIT85). 

Another consideration of promoter regulation is that it is useful later to know the range of regulation of the osp- 
jpjbd. (Sec. 8) In particular, one should determine how nearly the absence of XINDUCE leads to the absence of IPBD 
on the GP surface; a non-leaky promoter is preferred. Non-leakiness is useful: a) to show that affinity of GP (osp-ipbd ) 
s for AfM(IPBD) is due to the osp-ipbd gene, and b) to allow growth of GP (osp-pbd) in the absence of XINDUCE if the 
expression of osp-pbd is disadvantageous. The lacUV5 promoter in conjunction with the LacH repressor is a preferred 
example. 

Sec. 4.2: DNA sequence design; 

The present invention is not limited to a single method of gene design. The following procedure is an example of 
one method of gene design that fills the needs of the present invention. 

If the amino-acid sequence of OSP-IPBD is a definite sequence, then the entire gene will be constructed (Sec. 
6.1). If random DNA is to be fused to ipbd , then a "display probe" is constructed first; the random DNA is then inserted 
to complete the population of putative osp-ipbd genes (Sec. 6.2) from which a functional osp-ipbd gene is identified 
by in vivo selection or kindred techniques. 

One may use any genetic engineering method to produce the correct gene fusion, so long as one can easily and 
accurately direct mutations to specific sites in the pbd DNA subsequence (Sec. 14.1). For the methods of mutagenesis 
considered here, however, the DNA sequence for the osp-ipbd gene must be different from any other DNA in the OCV. 
The degree and nature of difference needed is determined by the method of mutagenesis. One replaces subsequences 
coding for the PBD with vgDNA, then subsequences to be mutagenized must be bounded by restriction sites that are 
unique within the OCV. If single-stranded-oligonucleotide-directed mutagenesis is to be used, then the DNA sequence 
of the subsequence coding for the IPBD must be unique within the OCV 

Regulatory elements include: a) promoters, b) Shine-Dalgarno sequences, and c) transcriptional terminators, and 
may be isolated from nature or designed from knowledge of consensus sequences of natural regulatory regions. 

The coding portions of genes to be synthesized are designed at the protein level and then encoded in DNA. The 
amino acid sequences are chosen to achieve various goals, including: a) display of a IPBD on the surface of a GR b) 
change of charge on a IPBD, and c) generation of a population of PBDs from which to select an SBD. The ambiguity 
in the genetic code is exploited to allow optimal placement of restriction sites and to create various distributions of 
amino acids at variegated codons. — ~ — — - 

Sec. 4.3: Specific DNA sequence assignment: 

A computer program may be used to identify all possible ambiguous DNA sequences coding for an amino-acid 
sequence given by the user and to identify places where recognition sites for site-specific restriction enzymes could 
be provided without altering the amino-acid sequence. 

Restriction sites are positioned within the osp-ipbd gene so that the longest segment between sites is as short as 
possible. Enzymes the produce cohesive ends are preferred. The codon preferences of the intended host and the 
secondary structure of the messenger RNA are also considered. 

Sec. 5.1: Organization of gene synthesis: 

An established strategy for gene synthesis is to synthesize both strands of the entire gene in overlapping segments 
of 20 to 50 nucleotides (nts) (THER88). We prefer an alternative method that is more suitable for synthesis of vgDNA. 
Our method differs from previous methods (OLIP86, OLIP87, AUSU87) in that we: a) use two synthetic strands, and 
b) do not cut the extended DNA in the middle. Our goals are: a) to produce longer pieces of dsDNA than can be 
synthesized as ssDNAon commercial DNA synthesizers, and b) to produce strands complementary to single-stranded 
vgDNA. By using two synthetic strands, we remove the requirement for a palindromic sequence at the 3* end. 

DNA synthesizers can produce oligo-nts of up to 100 nts in reasonable yield, M DNA = 100. The parameters N w 
(the length of overlap needed to obtain efficient annealing) and N s (the number of spacer bases needed so that a 
-restriction-enzyme ean-cut-near-the-end of-blunt-ended dsDNA)-are-d8termined-by-DNA-and-enzyme-chemisti7.-N^-=- 
1 0 and N s = 5 are reasonable values. 

We divide the DNA sequence to be synthesized into two nearly equal parts, each 5-8 bases longer than half the 
total length, so that there is an overlap between the two parts of 10 to 16 bp (Nw) containing no variegated bases. The 
overlap preferably, is not palindromic and has high GC content. We synthesize the overlap portion and the 5' extension 
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of each strand. When these strands are annealed and completed with Klenow enzyme and all four NTPs, we obtain 
the desired sequence as blunt-ended dsDNA. If the DNA is to be ligated to other DNA having cohesive ends, five to 
ten (Ns) bases are added to that end. The synthetic dsDNA can then be cut efficiently with an appropriate restriction 
enzyme (OUP87). 

Because M DNA is not rigidly fixed at 100, the current limits of 190 (= 2 M DNA - NJ nts overall and 100 in each 
fragment are not rigid, but can be exceeded by 5 or 10 nts. Going beyond the limits of 190 and 100 will lead to lower 
yields, but these may be acceptable in certain cases. 

Sec. 5.2: DNA synthesis and purification methods : 

The present invention is not limited to any particular method of DNA synthesis or construction. 

In the preferred embodiment, DNA is synthesized by standard means on a Miliigen 7500 DNA synthesizer. The 
Milligen 7500 has seven vials from which phosphoramidites may be taken. Normally, the first four contain A, C, T, and 
G The other three vials may contain unusual bases such as inosine or mixtures of bases, the so-called "dirty bottle". 
The standard software allows programmed mixing of two, three, or four bases in equimolar quantities. 

The present invention is not limited to any particular method of purifying DNA for genetic engineering. Agarose 
gel electrophoresis and electroelution on an IBl device (International Biotechnologies, Inc., New Haven, CT) is, pref- 
erably, used to purify large dsDNA fragments. For oiigo-nts, PAGE and electroelution with an Epigene device (Epigene 
Corp., Baltimore, MD) are an alternative to HPLC. 

Sec. 6.1: Cloning of Known OSP-ipbd gene into OCV 

In the preferred method, the synthetic gene is constructed using plasmids that are transformed into bacterial cells 
by standard methods (MANI82. P 250) or slightly modified standard methods. Alternatively, DNA fragments derived 
from nature are operably linked to other fragments of DNA derived from nature or to synthetic DNA fragments. In most 
cases of the preferred method, gene synthesis involves construction of a series of plasmids containing larger and larger 
segments of the complete gene. 

Sec. 6.2 Cloning of Random DNA (Potential osol Into Dis play Probe: 

If random DNA and phenotypic selection or screening are used to obtain a GP(IPBD), then we clone random DNA 
i n to one of th« restriction sites that was designed into the display probe. 

The random DNA may be obtained in a variety of ways. Degenerate synthetic DNA is one possibility Alternatively, 
pseudorandom DNA may be taken from nature. If, for example, an Sph I site (GCATG/C) has been designed into the 
display probe at one end of the ipbd fragment, then we would use Nja III (CATG/) to partially d.gest DNA that contains 
a wide variety of sequences, generating a wide variety of fragments with CATG 3' ovemangs. Preferably, the display 
probe has different restriction sites at each end of the ipbd gene so that random DNA can be cloned at erther end 

A plasmid carrying the dispfcy probe is digested with the appropriate restriction enzyme and the fragmented, 
random DNA is annealed and ligated by standard methods. The ligated plasmkJs are used to transform ce Is that are 
grown and selected for expression of the antibiotic-resistance gene. Plasmid-beanng GPs are then selected for the 
display-of-IPBD phenotype by the procedure given in Sec. 15 of the present invention using AfM(IPBD) as if it were 
the target. Sec. 15 is designed to isolate GP(PBD)s that bind to a target from a large populat.cn that do not bind. 

Sec. 7: Harvest of GPs : 

Cells are transformed with ligated OCVs and selected for uptake of OCV after an appropriate incubation with an 
agent appropriate to the selectable markers on the OCV. GPs are harvested by methods appropriate to the GP at hand, 
generally, centrrtugatton to pelletfce GPs and resuspension of the pellets in sterile medium (cells) or buffer (spores or 

phage). 

Sec. 8: Verification of Display Strategy: 

The h arvested packag es are now tes ted for display of IPBD on the surface; any ions or cefaclors known to be 
ential for the stability of IPBD or AfM(IPBO) must bTOIuoea-arappropriateleve^^ 



essential for the stability v . ^ ~. ,,.... v . — , XK _ 
affinity labeling b) enzymatically. c) spectrophotometries d) by affinity separation, or e) by affinity prec.prtat.on. The 
^SStftSSgin* ptked 1 have strong affinity (preferably. K d < 10- M) for the IPB ' ^le -djm.e 
or no affinity for the wtGP. For example, if BPTI were the IPBD, tryps.n, anhydrouyps.n or ant.bod.es ic BPTH could 
be used asL AfM(BPTI) to test for me presence of BPTI. Anh^^^ 
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to dehydroalanine, has no proteolytic activity but retains its affinity for BPTi (AKOH72 and HUBE77). 

Preferably, the presence of the IPBO on the surface of the GP is demonstrated through the use of a soluble, labeled 
derivative of a AfM(IPBD) with high affinity for IPBD. The labeled derivative of AfM(IPBD) is denoted as AfM(IPBD)*. 

If random DNA has been used, then the procedures of Sec. 15 are used to obtain a clonal isolate that has the 
display-of-IPBD phenotype. Alternatively, clonal isolates may be screened for the display-of-IPBD phenotype. The tests 
of this step are applied to one or more of these clonal isolates. 

If no isolates that bind to the affinity molecule are obtained we take corrective action as disclosed in Sec. 9. 

If one or more of the tests indicates that the IPBD is displayed on the GP surface, we verify that the binding of 
molecules having known affinity for IPBD is due to the chimeric ospjpbd gene through the use of standard genetic 
and biochemical techniques, such as: 

1) transferring the osp-ipbd gene into the parent GP to verify that osp-ipbd confers binding, 

2) deleting the osp-ipbd gene from the isolated GP to verify that loss of osp-ipbd causes loss of binding, 

3) showing that binding of GPs to AfM(IPBD) correlates with [XINDUCE] (in those cases that expression of osp; 
ipbd is controlled by [XINDUCE]), and 

4) showing that binding of GPs to Af M(IPBD) is specific to the immobilized AfM(IPBD) and not to the support matrix. 

Presence of IPBD on the GP surface is indicated by a strong correlation between [XINDUCE] and the reactions 
that are linear in the amount of IPBD (such as: a) binding of GPs by soluble AfM (IPBD) *, b) absorption caused by 
IPBD, and c) biochemical reactions of IPBD). The demonstration (4) that binding is to AfM(IPBD) and the genetic tests 
(1) and (2) are important; the test with XINDUCE (3) is less so. 

We sequence the relevant ipbd gene fragment from each of several clonal isolates to determine the construction. 

We establish the maximum salt concentration and pH range for which the GP(IPBD) binds the chosen AfM(IPBD). 

If the IPBD is displayed on the outside of the GP, and if that display is clearly caused by the introduced osp-ipbd 
gene, we proceed to Part II, otherwise we must analyze the result and adopt appropriate corrective measures. 

Sec. 9: Perfecting the Display Svstem: 

I f we have att e mptod to fus e an jgbd-fragment to a n a tur a l psfj fr a gment , our options are ; 

1 ) pick a different fusion to the same osg by 

a) using opposite end of osp , 

b) keeping more or fewer residues from osp. in the fusion; for example, in increments of 3 or 4 residues, 

c) trying a known or predicted domain boundary, 

d) trying a predicted loop or turn position, 

2) pick a different osp, or 

3) switch to random DNA method. 

If we have just tried the random DNA method unsuccessfully, our options are : 

1 ) choose a different relationship between ipbd fragment and random DNA (ipbd first, random DNA second or vice 
versa ), 

2) try a different degree of partial digestion, a different enzyme for partial digestion, a different degree of shearing 
or a different source of natural DNA, or 

3)-switch_tO-the.natural.OSP_method.„_^ 

If all reasonable OSPs of the current GP have been tried and the random DNA method has been tried, both without 
success, we pick a new GP. 
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Part II 

Sec. 10.0: Affinity Separation Means: 



In Part II we optimize an affinity separation system that will be used in Part (II to enrich a population of GP(vgPBD) 
s for those GP(PBD)s that display PBDs with increased affinity for the target. 

Affinity chromatography is the preferred means, but FACS, electrophoresis, or other means may also be used. 

Sec. 10.1: Optimization of Affinity Chromatography Separation : 



10 



Changes in eluant concentration cause GPs to elute from the column. Elution volume, however, is more easily 
measured and specified. It is to be understood that the eluant concentration is the agent causing GP release and that 
an eluant concentration can be calculated from an elution volume and the specified gradient. 

- Using a specified elution regime, we compare the elution volumes of GP(IPBD)s with the elution volumes of wtGP 
is on affinity columns supporting AfM(lPBD). Comparisons are made at various: a) amounts of IP8D/GP, b) densities of 
AfM(lP8D)/(volume of matrix) (DoAMoM), c) initial ionic strengths, d) elution rates, e) amounts of GP/(volume of sup- 
port), f ) pHs, and g) temperatures, because these are the parameters most likely to affect the sensitivity and efficiency 
of the separation. We then pick those conditions giving the best separation. 

We do not optimize pH or temperature; rather we record optimal values for the other parameters for one or more 
20 values of pH and temperature. The conditions of intended use, specified by the user (Sec. 11 ). may include a specifi- 
cation of pH or temperature. If pH is specified, then pH will not be varied in eluting the column (Sec. 15.3). Decreasing 
pH may be used to liberate bound GPs from the matrix. If the intended use specifies a temperature, we will hold the 
affinity column at the specified temperature during elution, but we might vary the temperature during recovery. 

The AFM (IPBD) is preferably one known to have moderate affinity for the IPBO (K d in the range 10* 6 M to 10* 
2$ M). When populations of GP(vgPBD)s are fractionated, there will be roughly three subpopulations: a) those with no 
binding, b) those that have some binding but can be washed off with high salt or low pH, and c) those that bind very 
tightly and must be rescued in situ. We optimize the parameters to separate (a) from (b) rather than (b) from (c). Let 
PBD W be a PBD having weak binding to the target and PB0 S be a PBD having strong binding. Higher DoAMoM might, 
for example favor retention of GP(PBD W ) but also make it very difficult to elute viable GP(PBD S ). We will optimize the 
30 affinity separation to retain GP(PBDJ rather than to allow release of GPJPBDJ because a tightly bound GPfPBDJ 
can be rescued bv in situ growth. If we find that DoAMoM strongly affects the elution volume, then in part ill we may 
caduca the amount of target on the affinity column when an SBD has been found with moderately strong affinity (K d 



on the order of 10" 7 M) for the target. 

In this step, we measure elution volumes of genetically pure GPs that elute from the affinity matrix as sharp bands 
3S that can be detected by UV absorption. Samples from effluent fractions are plated on suitable medium (cells or spores) 
or on sensitive cells (phage) and colonies or plaques counted. 

Several values of IPBD/GP, DoAMoM, elution rates, initial ionic strengths, and loadings should be examined. We 
anticipate that optimal values of IPBD/GP and DoAMoM will be correlated and therefore should be optimized together. 
The effects of initial ionic strength, elution rate, and amount of GP/(matrix volume) are unlikely to be strongly correlated, 
40 and so they can be optimized independently. 

For each set of parameters to be tested, the column is eluted in a specified manner. For example, we may use a 
regime called Elution Regime 1 : a KCI gradient runs from 1 0mM to maximum allowed for the GP(IPBD) viability in 1 00 
fractions of 0 05 V v (void volume), followed by 20 fractions of 0.05 V v at maximum allowed KCI; pH of the buffer is 
maintained at the specified value with a convenient buffer such as Tris. It is important that the conditions of this opti- 
cs mization be similar to the conditions that are used in Part III for selection for binding to target (Sec. 15.3) and recovery 
of GPs from the chromatographic system (Sec. 15.4). r ^„ ir>1 A 

When the osp-ipbd gene is regulated by [XINDUCE], IPBD/GP can be controlled by varying [XINDUCE]. Appro- 
priate values of [XINDUCE] depend on the identity of [XINDUCE] and the promoter; if, for example, XINDUCE is iso- 
propylthiogalactoside (IPTG) and the promoter is lacUVS , then [IPTG] = 0. 0.1 uM, 1.0 uM, 10.0 uM, 100.0 uM, and 
so i.o mM are appropriate levels to test. The range of variation of [XINDUCE] is extended until an optimum is found or 
an acceptable level of expression is obtained. 

DoAMoM is varied from the maximum that the matrix material can bind to 1% or 0.1% of this level in appropriate 
steps We anticipate that the efficiency of separation will be a smooth function of DOAMOM so that it is appropriate to 
"c^ver a wide range of values for DoAMoM with a coarse gridWthWexplore^^ 
55 optimum with a finer grid. 

Several values of initial ionic strength are tested, such as 1 .0 mM. 5.0 mM. 10.0 mM and 20.0 mM. 
The elution rate is varied, by successive factors of 1/2, from the maximum attainable rate to 1/1 6 of this value. The 
fastest elution rate giving the good separation is optimal. 
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The goal of the optimization is toobtain a sharp transition between bound and unbound GPs, triggered by increasing 
salt or decreasing pH or a combination of both. This optimization need be performed only: a) for each temperature to 
be used, b) for each pH to be used, and c) when a new GP(IPBO) is created. 

Regulable promoters are available for all genetic packages except, possibly, bacterial spores. A promoter func- 
tional in bacterial spores might be prepared by constructing a hybrid of a spoliation promoter and a regulatable bac- 
terial promoter (e.g., lac), or by saturation mutagenesis of a spoliation promoter followed by screening for regulatable 
promoter activity (cf. OLIP86, OLIP87). When the promoter of the ospjpbd gene is not regulatable, we optimize 
DoAMoM, the elution rate, and the amount of GP/volume of matrix. If the optimized affinity separation is not acceptable, 
we must develop a means to alter the amount of IPBD per GP 

Sec. 10.2: Measuring the sensitivity of affinity separation: 

We determine the sensitivity of the affinity separation (C^) by measuring the minimum quantity of GP(IPBD) 
that can be detected in the presence of a large excess of wtGP. The user chooses a number of separation cycles, 
is denoted N chrom , that will be performed before an enrichment is abandoned; preferably, is in the range 6 to 10 

and N chrom ° must be greater than 4. Enrichment can be terminated by isolation of a desired GP(SBD) before N chrom 
passes. 

The measurement of sensitivity is significantly expedited if GP(IPBD) and wtGP carry different selectable markers. 
Mixtures of GP(IPBO) and wtGP are prepared in the ratios of 1 :V lim , where V Cm ranges by an appropriate factor 
20 (e^ 1/10) over an appropriate range, typically 10 11 through 10 4 . Large values of V lm are tested first; once a positive 
result is obtained for one value of V limi no smaller values of V lfrn need be tested. Each mixture is applied to a column 
supporting, at the optimal DoAMoM, an AfM(IPBO) having high affinity for IPBD and the column is eluted by the specified 
elution regime. The last fraction that contains viable GPs and an inoculum of the column matrix material are cultured. 
If GP(IPBD) and wtGP have different selectable markers, then transfer onto selection plates identifies each colony 
25 Otherwise, a number (ag, 32) of GP clonal isolates are tested for presence of IPBD by the techniques discussed in 
Sec. 8. 

If IPBD is not detected on the surface of any of the isolated GPs, then GPs are pooled from: a) the last few (ejg, 
3 to 5) fractions that contain viable GPs, and b) an inoculum taken from the column matrix. The pooled GPs are cultured 
and passed over the same column and enriched for GP(IPBD) in the manner described. This process is repeated until 
30 ^chrom P asses have been P^ormed t or until the IPBD has been detected on the GPs. If GP(IPBD) is not detected 
after N chfom passes, V ljrn is decreased and the process is repeated. 

G ■ equals4he highest valuft nf V lim fo r w hich the user can recover GP(IPBD) within N chf0m passes. The number 

of chromatographic cycles (K^J that were needed to isolate GP(IPBD) gives a rough estimate of C eff ; C eff is approx- 
imately the Kcy C th root of Vlim: 

C eff = (approx.) exp( log^J/K^) 



35 



For example, if V Pm were 4.0 x 1 0* and three separation cycles were needed to isolate GP(IPBD), then C eff = (approx.) 
40 736. 

Sec. 10.3: Measuring the efficiency of separation : 

To determine C eff more accurately, we determine the ratio of GP(IPBD)/wtGP loaded onto an AfM(lPBD) column 
45 that yields. approximately equal amounts of GP(IPBD) and wtGP after elution. 

Sec. 10.4: Other Separation Means 

Other separation means are optimized in a manner parallel to the used for affinity chromatography. 
so FACS (e g FACStar from Beckton-Dickinson, Mountain View. CA) is most appropriate for bacterial cells and spores 

because the sensitivity of the machines requires approximately 1000 molecules of fluorescent label bound to each GP 
to accomplish a separation. To optimize FACS separation of GPs, we use a derivative of Afm(IPBD) that is labeled 

.with a fluorescent mo lecule , denoted Afm ( IPBD)*. The variables that must be cptimizedjnclude: a) amount of IPBD/_ 

GP, b) concentration of Afm (IPBD) c) ionic strength, d) concentration orGPs, and e) parameters-pertainingno-op. 
55 eration of the FACS machine. Because Afm(IPBD)* and GPs interact in solution, the binding will be linear in both [Afm 
(IPBD)*] and [displayed IPBD]. Preferably, these two parameters are varied together. The other parameters can be 
optimized independently. 

Electrophoresis is most appropriate to bacteriophage because of their small size (SERW87). Electrophoresis is a 
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preferred separation means il the target is so small that chemically attaching it to a column or to a fluorescent label 
would essentially change the entire target For example, chloroacetate ions contain only seven atoms and would be 
essentially altered by any linkage. GPs that bind chloroacetate would become more negatively charged than GPs that 
do not bind the ion and so these classes of GPs could be separated. 

The parameters to optimize for electrophoresis include: a) IPBD/GP, b) concentration of gel material, agarose. 
c) concentration of Afm (I PBD), d) ionic strength, e) size, shape, and cooling capacity of the electrophoresis apparatus. 
Q voSges and currents, and f) concentration of GPs. Preferably. IPBD/GP and [Afm(IPBD)] are varied at the same 
time and other parameters are optimized independently. 



10 Part III 



Sec. 11.0: Choice of target material : 

Any material may be chosen as target material, subject only to the following restrictions: 
If affinity chromatography is to be used, then: 

1) the molecules of the target material must be of sufficient size and chemical reactivrty to be applied to a solid 
support suitable for affinity separation, 

2) after application to a matrix, the target material must not react with water, 

3) after application to a matrix, the target material must not bind or degrade proteins in a non-specific way, and 

4) the molecules of the target material must be sufficiently large that attaching the material to a matrix allows 
enough unaltered surface area (generally at least 500 M excluding the atom that ,s connected to the hnker) for 
protein binding. 

If FACS is to be used as the affinity separation means, then: 

1) the molecules of the target material must be of sufficient size and chemical reactivity to be conjugated to a 
suitable fluorescent dye or the target must itself be fluorescent. 

2) after any necessary fluorescent labeling, the target must not react with water. 

3) after any necessary fluorescent labeling, the target material must not bind or degrade proteins in a nonspecific 
way, and 

4^ the molecules of the target material must be sufficiently large that attaching the material to a suitable dye allows 
lo^^^^ea (generally at least 500 A* excluding the atom that is connected to the linker) for 

protein binding. 

If affinity electrophoresis is to be used, then: 

1 ) the target must either be charged or of such a nature that its binding to a protein will change the charge of the 
protein, 

2) the target material must not react with water, 

3) the target material must not bind or degrade proteins in a non-specific way, and 

4) the target must be compatible with a suitable gel material. 
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dichlorodiphenyltrichlorethane (DDT), benzo<a)pyrene, prostaglandin PGE2, protoporphyrin IX, or actmomycm D), g) 
organometallic complexes (such as iron haem orcobott haem), h) organic polymers (such as cellulose or chitm), i) 
insoluble minerals (such as asbestos, zeolites, or hydroxylapatite), j) viral and phage coat proteins (such as influenza 
haemaggutinin or phage lambda capsid), and k) bacterial membrane or outer membrane proteins (such as LamB from 
E. coli or flagella proteins). 

A supply of several milligrams of pure target material is desired. Impure target material could be used, but one 
might obtain a protein that binds to a contaminant instead of to the target. 
The following information about the target material is highly desirable: 

1) stability as a function of temperature, pH, and ionic strength, 

2) stability with respect to chaotropes such as urea or guanidinium CI, 

3) pi. 

4) molecular weight, 

5) requirements for prosthetic groups or ions, such as haem or Ca+ 2 , and 

6) proteolytic activity, if any. 

In addition to this most desirable information, it is useful to know 1 ) the target's sequence, if the target is a mac- 
romolecule, 2) the 3D structure of the target, 3) enzymatic activity, if any, and 4) toxicity, if any. 

The user of the present invention specifies certain parameters of the intended use of the binding protein: 

t) the acceptable temperature range, 

2) the acceptable pH range, 

3) the acceptable concentrations of ions and neutral solutes, 

4) the maximum acceptable dissociation constant for the target and the SBD: 

K T = [Target][SBD]/[Target:SBDl 
In some cases, the user may require discrimination between T, the target, and N. some non-target. Let 

Kj = [T][SBD]/[T:SBl, 



and 

K N = [N][SBD]/[N:SBD], 



then 

Kj/K N = ([T][N:SBD})/([N][T:S8D]). 

The user then specifies a maximum acceptable value for the ratio IVK N . 

If the target material is a general protease, one must consider the following points: 



1 ) a highly specific protease can be treated like any other target, 

2) a general protease, such as subtilisin, may degrade the OSPs of the GP including OSP-PBOs; there are several 
altematK/e ways of dealing with general proteases, including: a) a chemical inhibitor may be used to prevent pro- 
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teoK/sis (eq phenylmethylfluorosulfate (PMFS) that inhibits serine proteases), b) one or more active-site residues 
m Tbe 1S£d to create an inactive protein (e^ a serine protease in which the active senne » mutated to 

activity (e^ a serine protease in which the active serine is converted to anhydroserme), 

3) SBDs selected for binding to a protease need not be inhibitors; SBDs that happen to inhibit the protease target 
are a fairly small subset of SBDs that bind to the protease target, 

4) the more we modify the target protease, the less like we are to obtain an SBO that inhibits the target protease, and 

51 if the user requires that the SBO inhibit the target protease, then the actK/e site of the target protease must not 
be S £ZZ than necessary; inacth/ation by mutation or chemical modification are pre erred methods of 
nac^aS 2 a protein protease inhibitor becomes a prime candidate for IPBD. For example BPTI couW be 
muS by ^ methods oMhe present invention, to bind to proteases other than tryps.n (TANK77 and TSCH87). 

Sec. 12.0: Qnica of GPflPBO) : 

The user must pk* a GP(I PBD) that is suitable to the chosen target according to the criteria of Sec. 2. it is anticipated 
tJSZZZL of a GP(, P B0,s can be assembled such ^^J^^^«CS^S 

cr e nT=^^^ * - — - 

by the methods described in Part II. 

Sec 13-0: Identified of Family of PB Hfi, Mated to PPBD. to Be Generated 
Sec 13.1: Choosino residues on IPBD (or other PPBD) to var 

the IPBD, b) sequences homologous to IPBD. and J ^ vafied sjmul . 

number of residues that could strong , uence user must als0 pick trie, ievels of 

at T k e7^ 

inately so that they can easily be removed from the W«*J*V. one of _ e strongest forces drMng the binding of 
Burial of hydrophobic surfaces so that bulk water ■» ^excludec ^ ^ S j3^° acu|e5 only 9 if the surfaces 
proteins to other molecules. Bulk water can be .excluded \Um 3^^^™ menta(y t0 \ ne target . The 
are complementary. We must test as nfcny -f n f^ r ^ surface on the target, 

selection-through-binding isolates those P^^^^^E of dLent surfaces, rather than the 

z^z^x:^^^^^ - in ° ur — rather 

Effij! a 2D schematic of 3D Proteins 
34. 36, 37, 38,. and 39 of the I PBD are on the 3D surfac e of the to be a set of residues such that all 
do not have distinct, countable faces. Therefore we define an .nterac .on set atom 0 „ he target coming 

members of the set can simultaneous* touch one molecule of the **get matenal wrthort any a g ^ 

ctoserthan van derWaaJs distance to any maintain ^™.^SCSes residues 6. 7, 20, 21. 
Sggg ^r SSS ^ s^com p^ residues 1 , , 4, S, 31 , 

and 20 different surfaces for mteract.cn set * Note tha «^ « versions o( intaraction se , 8 . 
through all 20 amino acids yields 20 vers.ons of t^no7cWs qenerating 400 protein sequences. If the 

rlwconsWervarying two resWues. each ^^^^J^.^^^ 40 different surfaces 
two residues varied were, for example, number 1 and number 21 . then mere wou.o y 
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because interaction set A does not depend on residue 1 and interaction set B does not depend on residue 21. If the 
two residues varied, however, were number 7 and number 21, then 400 surfaces would be generated. 

If N spatially separated residues are varied at one time, 20 x N surfaces are generated, variation of N residues in 
the same interaction set yields 20 N surfaces. For example, if N = 7, variation of separated residues yields 1 40 surfaces 
5 while variation of interacting residues yields 20 7 = 1.28 x 10 9 surfaces. Thus, to maximize the number of surfaces 
generated when N residues are varied, ail residues should be in the same interaction set. 

The amount of surface area buried in strong protein -protein interactions ranges from 1000 A 2 to 2000 A 2 (SCHU79, 
p103ff). Individual amino acids have total surface areas that depend mostly on type of amino acid and weakly on 
conformation. These areas range from about 1 80 A 2 for glycine to about 360 A 2 for tryptophan. From amino-acid solvent 
10 exposures of published protein structures, we calculate that 1 0OOA 2 on a protein surface comprises between 4 and 30 
amino-acid residues. Varied amino acid sequences, as found in actual proteins, involve between 10 and 25 residues 
in forming 1 000 A 2 of protein surface. Schulz and Schirmer estimate that 1 00 A 2 of protein surface can exhibit as many 
as 1000 different specific patterns (SCHU79, p105). The number of surface patterns rises exponentially with the area 
that can be varied independently. One of the BPTI structures recorded in the Brookhaven Protein Data Bank (6PTI), 
15 for example, has a total exposed surface area of 3997 A 2 (using the method of Lee and Richards (LEEB71) and a 
solvent radius of 1.4 A and atomic radii as shown in Table 7). If we could vary this surface freely and if 100 A 2 can 
produce 1000 patterns, we could construct 10 120 different patterns by varying the surface of BPTI! This calculation is 
intended only to suggest the huge number of possible surface patterns based on a common protein backbone. 

One protein framework cannot, however, display all possible patterns over any one particular 100 A 2 of surface 
20 merely by replacement of the side groups of surface residues. The protein backbone holds the varied side groups in 
approximately constant locations so that the variations are not independent We can, nevertheless, generate a vast 
collection of different protein surfaces by varying those protein residues that face the outside of the protein. 

Examination of a model of BPTI in contact with myoglobin shows that residues 3, 7, 8, 10, 1 3, 39, 41 , and 42 can 
all simultaneously contact a molecule the size and shape of myoglobin. Residue 49 cannot touch a single myoglobin 
25 molecule simultaneously with any of the first set even though all are on the surface of BPTI. It is not the intent of the 
present invention, however, to use models to determine which part of the target molecule will actually be the site of 
binding by a PBD. 

For cassette mutagenesis, the protein residues to be varied are, preferably, close enough in sequence that the 
variegated DNA (vgDNA) encoding all of them can be made in one piece. The present invention is not limited to a 
30 particular length of vgDNA that can be synthesized. With current technology, a stretch of 60 amino acids (180 DNA 
bases) can be spanned. 

One can use other mutational means, such as single-stranded-oligonucleotide-directed mutagenesis (BOTS85) 

using two or more mutating primers to mutate widely separated residues. 

Alternatively, to vary residues separated by more than sixty residues, two cassettes may be mutated. A first cassette 
35 is mutagenized to produce a population having, for example, up to 30,000 members. Using variegated OCV, we mu- 
tagenize a second cassette to produce a second variegated population having the desired diversity. 

The composite level of variation must not exceed the prevailing capabilities to a) produce very large numbers of 
independently transformed cells or b) detect small components in a highly varied population. The limits on the level of 

variegation are discussed in Sec. 1 3.2. 
40 We assemble the data about the IPBD and the target that are useful in deciding which residues to vary 1) 3D 

structure, or at least a list of residues on the surface of the IPBD, 2) list of sequences homologous to IPBD, and 3) 
model of the target molecule or a stand-in for the target. 

These data and an understanding of the behavior of different amino acids in proteins will be used to answer two 

questions: 

45 

1 ) which residues of the IPBD are on the outside and close enough together in space to touch the target simulta- 
neously? 

2) which residues of the IPBD can be varied with high probability of retaining the underlying IPBD structure? 

50 

Although an atomic model of the target material from X-ray crystallography, NMR, etc. is preferred in such exam- 
ination, it is not necessary. For example, if the target were a protein of unknown 3D structure, it would be sufficient to 
know the molecular weight of the protein an d whethe r it were a soluble globular protein, a fibrous protein, or a membrane 
protein. One can then choose a protein of "known struetur^tlfe^^ 
ss molecular stand-in and yardstick. At low resolution, all proteins of a given size and class look much the same. The 
specific volumes are the same, all are more or less spherical and therefore all proteins of the same size and class 
have about the same radius of curvature. The radii of curvature of the two molecules determine how much of the two 
molecules can come into contact. 
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The most appropriate method of picking the residues of the protein chain at which the amino acids should be varied 
is by viewing with interactive computer graphics, a model of the IPBO. A stick-figure representation of molecules is 
preferred A suitable set of hardware is an Evans & Sutherland PS390 graphics terminal (Evans & Sutherland Corpo- 
ration Salt Lake City UT) and a MicroVAX II supermicro computer (Digital Equipment Corp.. Maynard, MA). Suitable 
programs for viewing and manipulating protein models include: a) PS-FRODO. written by T. A. Jones (JONE95) and 
distributed by the Biochemistry Department of Rice University. Houston. TX; and b) PROTEUS, developed by Dayringer, 

Tramantano, and Fletterick (DAYR86). 

Theoretical calculations, such as dynamic simulations of proteins, are used to estimate the effect of substitution 
at a particular residue of a particular amino-acid type on the 3D structure of the parent protein. Such calculations might 
also indicate whether a particular substitution will greatly affect the flexibility of the protein. 

Sec. 1 3. 1 .1 : The principal set: 

Using the knowledge of which residues are on the surface of the IPBD, we pick residues that are close enough 
together on the surface of the IPBD to touch a molecule of the target simultaneously without having any IPBD main- 
chain atom come closer than van der Waals distance (viz, 4.0 to 5.0 A) from any target atom A residue of the IPBD 
•touches" the target if: a) a maintain atom is within van der Waals distance, viz, 4.0 to 5,0 A of any atom of the target 
molecule, or b) the Cbata is within 0cutoff of any atom of the target molecule so that a side-group atom could make 
contact with that atom. Because side groups differ in size <& Table 35). some judgment is required r. pecking CW 
In the preferred embodiment, we will use D cuto(t = 8.0 A, but other values in the range 6.0 A to 10.0 A could be used. 
If IPBD has G at a residue, we construct a pseudo C b9ta with the correct bond distance and angles and judge the ability 
of the residue to touch the target from this pseudo C^. 

Aftematively, we choose a set of residues on the surface of the IPBD such that the curvature of the surface defined 
by the residues in the set is not so great that it would prevent contact between all residues in the set and la molecule 
of the target. This method is appropriate if the target is a macromolecule. such as a protein, because the PBDs denved 
from the IPBD will contact only a part of the macromolecular surface. 

Prefer that there be some indication that the underlying IPBD structure will tolerate substations at each residue 
in the principal set of residues. Indications could come from various sources, including: a) homologous sequences, b) 
static computer modeling, or c) dynamic computer simulations. 

The residues in the principal set need not be contiguous in the protein sequence. We requ.re only that the ammo 
acids in the residues to be varied all be capable of touching a molecule of the target material simultaneously without 

U the ta-yt wsre. for example, horse heart myoglobin, and if the IPBD were BPTI. any set of 



hav i ng atoms overl a p . 



residues in one interaction set of BPTI defined in Table 34 could be picked. 

Pre eS the principal set contains eight to sbcteen residues. This number of residues ^J**""™™ 
35 that a surface that is complementary to the target can be found, but is small enough that a significant fraction of the 
surface can be varied at one time. 



Sec. 1 3.1 .2: The secondary set: 

The secondary set comprises residues that touch residues in the primary set and 
set because the residue: a) is internal, b) is highly conserved, or c) is on the surface, but the * »• P * ° 

surface prevents the residue from being in contact with the target at the same fme as one or more residues in the 

Pri 7n7emai residues although frequently conserved and may tolerate some conservative changes such as I to L or 
« F J Y ^^«7th. detail placement and dynamics of adjacent protein rescues and such venation may 

b ° U lT^TLT6uT!^o^ set are most often located on the periphery of the principal set, which do no, 
make d^ct con a w h he arget simultaneous* with all other residues o, the principal set The charge on the amino 
actdln r« of !.ese residues could, however, have a strong effect on binding. It is appropriate to vary the charge of 
so S "or a!l of Le residues to improve an SBD. For example, the 

at base 1 . equimolar C and A at base 2. and A at base 3 yields ammo acds T. A. K. and E with equal probability. 

Sec. 13.1.3: Ch oice of residues to vary initially: 

".p^^ \ • — 



55 



TO. allowed level of vanagattor, that assures prograsarvaly da.rtnines how many rasida.s can a. varied at once; 
9 ~^C%^™ » man, waye: me Mow*, k a prated *.d ££, •£ 

. ... -i ~ thfl nrinrinal set Two such pairs are used to delimit the surface, up/ 

that are diametrically opposed across the face of the principal sei. iwo &uu. a 
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down and right/left. Alternatively, three residues that form an inscribed triangle, having as large an area as possible, 
on the surface are picked. One to three other residues are picked in a checkerboard fashion across the interaction 
surface. Choice of widely spaced residues to vary creates the possibility for high specificity because all the intervening 
residues must have acceptable complementarity before favorable interactions can occur at widely-separated residues. 

s The number of residues picked is coupled to the range through which each can be varied by the restrictions dis- 

cussed in Sec. 1 3.2. In the first round, we do not assume any binding between IPBD and the target and so progressivity 
is not an issue. At the first round, the user may elect to produce a level of variegation such that each molecule of vgDN A 
is potentially different through, for example, unlimited variegation of 10 codons (20 10 approx. = 10 13 ). One run of the 
DNA synthesizer produces approximately 1 0 13 molecules of length 1 00 nts. Inefficiencies in ligation and transformation 

10 will reduce the number of proteins actually tested to between 10 7 and 5 x 10 a . Multiple iterations of the process with 
such very high levels of variegation will not yield repeatable results; the user must decide whether this is important. 

Sec. 13.2: Range of variation at Each Site of Mutation: 

is The total level of variegation is the product of the number of variants at each varied residue. Each varied residue 

can have a different scheme of variegation, producing 2 to 20 different possibilities. We require that the process be 
progressive, Le. each variegation cycle produces a better starting point for the next variegation cycle than the previous 
cycle produced. 

N.B.: Setting the level of variegation such that the ppbd and many sequences related to the ppbd sequence are present 

20 in detectable amounts insures that the process is progressive. If the level of variegation is so high that the ppbd se- 
quence is present at such low levels that there is an appreciable chance that no transformant will display the PPBD, 
then the best SBD of the next round could be worse than the PPBD. At excessively high level of variegation, each 
round of mutagenesis is independent of previous rounds and there is no assurance of progressivity. This approach 
can lead to valuable binding proteins, but repetition of experiments with this level of variegation will not yield progressive 

2S results. Excessive variation is not preferred. 

If the level of variegation is such that the parental sequence and each single amino-acid change is present for 
selection, then we know that a selected sequence is closer to optimal or the same as the parent. If, on the other hand, 
very high levels of variegation are used, a sequence may be selected, not because it is superior to the parental se- 
quence, but because the parental and improved sequences are, by chance, absent 

30 Progressivity is not an all-or-nothing property. So long as most of the information obtained from previous variegation 

cycles is retained and many different surfaces that are related to the PPBD surface are produced, the process is 

prngrnfifiK/fl If tha laval of variegation is so high that the pphri gene may not be detected, the assurance of progressivity 

diminishes. If the probability of recovering PPBD is negligible, then the probability of progressive behavior is also 
negligible. 

3$ An opposing force in our design considerations is that PBDs are useful in the population only up to the amount 

that can be detected; any excess above the detectable amount is wasted. Thus we produce as many surfaces related 
to PPBD as possible within the constraint that the PPBD be detectable. 

We defer specification of exactly how much variegation is allowed until we have: a) specified real nt distributions 
for a variegated codon, and b) examined the effects of discrepancies between specified nt distributions and actual nt 

40 distributions. 

Sec. 13.3: Design of vgDNA Encoding PBD Family: 

We must now decide how to distribute the variegation within the codons for the residues to be varied. These 

*s decisions are influenced by the nature of the genetic code. When vgDNA is synthesized, variation at the first base of 
a codon creates a population containing amino acids from the same column of the genetic code table (as shown in the 
Table 3-6 on p87 of WATS87); variation at the second base of the codon creates a population containing amino acids 
from the same row of the genetic code table; variation at the third base of the codon creates a population containing 
amino acids from the same box. If two or three bases in the same codon are varied, the pattern is more complicated. 

so work with 3D protein structural models may suggest definite sets of amino acids to substitute at a given residue, but 
the method of variation may require either more or fewer kinds of amino acids be included. For example, examination 
of a model might suggest substitution of N or Q at a given residue. Combinatorial variation of codons requires that 

mfring N and Q at one location also includ e K and H as possibilities at the same residue. One must choose to put: 1) 

N only, 2) Q only, or 3) a mixture of N, K, H, and Q. The present invention does not rely on accurate predictions of 

55 which amino acids should be placed at each residue, rather attention is focused on which residues should be varied. 

There are many ways to generate diversity in a protein. (See RICH86. CARU85, and OLIP86.) One extreme case 
is that one or a few residues of the protein are varied as much as possible (inter alia see CARU85, CARU87, RICH86, 
and WHAR86). We will call this limit "Focused Mutagenesis". Focused Mutagenesis is appropriate when the IPBD or 
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other PPBD shows little or no binding to the target, as at the beginning of the search for a protein to bind to a new 
target material. When there is no binding between the PPBD and the target, we preferably pick a set of five to seven 
residues and vary each through all 20 possibilities. 

An alternative plan of mutagenesis ("Diffuse Mutagenesis') is to vary many more residues through a more limited 

5 set of choices (See Vershon et aJL, Ch15 of INOU86 and PAKU86). This can be accomplished by spiking each of the 
pure nts activated for DNA synthesis (e.g. nt-phosphoramidites) with a small amount of one or more of the other acti- 
vated nts. Contrary to general practice, the present invention sets the level of spiking so that only a small percentage 
( 1% to .00001%. for example ) of the final product contains the initial DNA sequence. Many single, double, triple, and 
higher mutations occur, but recovery of the basic sequence is a possible outcome. Let N b be the number of bases to 

10 be varied, and let Q be the fraction of all sequences that should have the parental sequence, then M, the fraction of 
the mixture that is the majority component, is 



M = exp{ log e (QyN b } = 10 (log 10 (QyN b ). 

75 

If, for example, thirty base pairs on the DNA chain were to be varied and 1% of the product is to have the parental 
sequence, then each mixed nt substrate should contain 86% of the parental nt and 14% of other nts. Table 8 shows 
the fraction (fn) of DNA molecules having n non-parental bases when 30 bases are synthesized with reagents that 
contain fraction M of the majority component. When M=.63096, f24 and higher are less than 10' 8 . The entry "most" in 

20 Table 8 is the number of changes that has the highest probability. Note that substantial probability for multiple substi- 
tutions only occurs if the fraction of parental sequence (fO) is allowed to drop to around 1Cr 6 . Mutagenesis of this sort 
can be applied to any part of the protein at any time, but is most appropriate when some binding to the target has been 
established. The N b base pairs of the DNA chain that are synthesized with mixed reagents need not be contiguous. 
They are picked so that between N^/3 and N b codons are affected to various degrees. The residues picked for mutation 

25 are picked with reference to the 3D structure of the IPBD, if known. For example, one might pick all or most of the 
residues in the principal and secondary set. We may impose restrictions on the extent of variation at each of these 
residues based on homologous sequences or other data. The mixture of non-parental nts need not be random, rather 
mixtures can be biased to give particular amino acid types specific probabilities of appearance at each codon. For 
example, one residue may contain a hydrophobic amino acid in all known homologous sequences; in such a case, the 

30 first and third base of that codon would be varied, but the second would be set to T. This diffuse structure-directed 

muta g e n e si s will reveal th e subtle chang e s possibl e in protein backbone a ssoci a ted with conservative i nterior ch a nges, 

such as V to I, as well as some not so subtle changes that require concomitant changes at two or more residues of 
the protein. 

For Focused Mutagenesis, we now consider the distribution of nts that will be inserted at each variegated codon. 

35 Each codon could be programmed differently. If we have no information indicating that a particular amino acid or class 
of amino acid is appropriate, we strive to substitute all amino acids with equal probability because representation of 
one pbd above the detectable level is wasteful. Equal amounts of all four nts at each position in a codon yields the 
amino acid distribution in which each amino acid is present in proportion to the number of codons that code for it. This 
distribution has the disadvantage of giving two basic residues for every acidic residue. In addition, six times as much 

40 R, S, and L as W or M occur. If five codons are synthesized with this distribution, sequences encoding five Rs are 
7776-tirnes more abundant than sequences encoding five Ws. To have W-W-W-W-W present at detectable levels, we 
must have R-R-R-R-R present in 7776-fold excess. 

Let Abun(x) be the abundance of DNA sequences coding for amino acid x, defined by the distribution of nts at 
each base of the codon. For any distribution, there will be a most-favored amino acid (mfaa) with abundance Abun 

45 (mfaa) and a least-favored amino acid (Ifaa) with abundance Abun(lfaa). We seek the nt distribution that allows all 
twenty amino acids and that yields the largest ratio Abun(lfaa)/Abun(mfaa) subject to two constraints: equal abundances 
of acidic and basic amino acids and the least possible number of stop codons. Thus only nt distributions that yield 
Abun(E)+Abun(D) = Abun(R)+Abun(K) are considered, and the function maximized is: 

50 {(1 -Abun(stop)) (Abun(lfaa)/Abun(mfaa))}. 

: _^_3 

We have simplified the search for an optimal nt distribution by limiting the third base to T or G (C or G is equivalent). 
AH amino acids are possible and the number of accessible stop codons is reduced because TG A and TAA codons are 
55 eliminated. The amino acids F, Y, C, H, N, I, and D require T at the third base while W, M, Q, K, and E require G. Thus 
we use an equimolar mixture of T and G at the third base. 

A computer program, written as part of the present invention and named "Find Optimum vgCodon" (See Table 9), 
varies the composition at bases 1 and 2, in steps of 0.05, and reports the composition that gives the largest value of 
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the quantity {(Abun(Ifaa)/Abun(mfaa) (1 -Abun(stop)))}. A vg codon is symbolically defined by the nt distribution at each 
base: 





T 


C 


A 


G 


base #1 = 


t1 


C1 


al 


91 


base #2 = 


t2 


c2 


a2 


g2 


base #3 = 


t3 


c3 


a3 


93 



10 



t1 +d +a1 +g1 =1.0 



75 



t2 + c2 + a2 + g2 = 1 .0 



t3 = g3 = 0.5, c3 = a3 = 0. 



The variation of the quantities tl.cl.al.gl, t2 B c2, a2, and g2 is subject to the constraint that Abun(E)+Abun(D) 
20 equals Abun(K)+Abun(R); * 

Abun(E)+Abun(D) = g1*a2 



25 



Abun(K)+Abun(R) = a1 *a2/2 + d *g2 + a1 *g2/2 



30 



Solving for g2. we obtain 



g1 *a2 = a1 *a2/2 + c1 *g2 + a1*g2/2 



35 



40 



In addition, 



g2 = (g1 *a2 - 0.5*a1 *a2)/(c1 + 0.5*a1 ). 



tl = 1 - al -d -gl 



t2=1-a2-c2-g2 . 



45 



SO 



We vary a1.c1.g1, a2. and c2 and then calculate t1 , g2, and t2. Initially, variation is in steps of 5%. Once an approx- 
imately optimum distribution of nts is determined, the region is further explored with steps of 1%. The logic of this 
program is shown in Table 9. The optimum distribution is: 



Optimum vgCodon 




T 


C 


A 


G 


base #1 = 


0.26 


0.18 


0.26 


0.30 


base #2 = 


0.22 


0.16 


0.40 


0.22 


-base-#3-=- 


-0:5— 


-0.0- 


-0:0— 


-0,5— 
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and yields DNA molecules encoding each type amino acid with the abundances shown in Table 10. 

The computer that controls a DNA synthesizer, such as the Milligen 7500. can be programmed to synthesize any 
base of an oligo-nt with any distribution of nts by taking some nt substrates (e^ nt phosphoramidites) from each of 
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two or more reservoirs. Alternatively, nt substrates can be mixed in any ratios and placed in one of the extra reservoir 
for so called 'dirty bottle" synthesis. 

The actual nt distribution obtained will differ from the specified nt distribution due to several causes, including: a) 
differential inherent reactivity of nt substrates, and b) differential deterioration of reagents. It is possible to compensate 
partially for these effects, but some residual error will occur. We denote the average discrepancy between specified 
and observed nt fraction as S efn 

S err = square root ( average[ (f obs - I^V*^ I ) 



were f obS is the amount of one type of nt found at a base and f^ is the amount of that type of nt that was specified 
at the same base. The average is over all specified types of nts and over a number (e^ 10 or 20) different variegated 
bases. By hypothesis, the actual nt distribution at a variegated base will be within 5% of the specified distribution. 
Actual DNA synthesizers and DNA synthetic chemistry may have different error levels. It is the user's responsibility to 
is determine S err for the DNA synthesizer and chemistry employed. 

To determine the possible effects of errors in nt composition on the amino-acid distribution, we modified the program 

"Find Optimum vgCodon" in four ways: 

1) the fraction of each nt in the first two bases is allowed to vary from its optimum value times .(1 - S err ) to the 
20 optimum value times (f + S err ) in seven equal steps (S err is the hypothetical fractional error level entered by the 

user); the sum of nt fractions at one base always equals 1.0, 

2) g2 is varied in the same manner as a2, Le. we dropped the restriction that Abun(D)+Abun(E) = Abun(K)+Abun(R) t 
25 3) t3 and g3 are varied from 0.5 times (1 - S err ) to 0.5 times (1 + S err ) in three equal steps, 

4) the smallest ratio Abun(!faa)/Abun(mfaa) is sought. 

In actual experiments, we will direct the synthesizer to produce the optimum DNA distribution "Optimum vgCodon" 
30 given above Incomplete control over DNA chemistry may, however, cause us to actually obtain the following distribution 
that is the worst that can be obtained if all nt fractions are within 5% of the amounts specified in "Optimum vgCodonV 
A corresponding table can be calculated for any given S err using the program "Find worst vgcooon wttnin serr of given 
distribution." given in Table 11. 

35 Optimum vqcodon, worst 5% errors 
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0. 
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0. 


287 


base 


#2 = 
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• 209 


0. 


160 


0. 


400 
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231 


base 


#3 = 
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.475 
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525 



so This distribution yields DNA encoding different amino acids at the abundances shown in Table 1 2. 

If five codons are synthesized with reagents mixed so as to produce the nt<listribution "Optimum vgCodon , and 
if we actually obtained the nt-distribution 'Optimum vgCodon, worst 5% errors', then DNA sequences encoding the 

"mfaa at all of the five codons are about 277lin^aTlikel^^ 

about 24% of the DNA sequences will have a stop codon in one or more of the five codons. 

55 When five codons are synthesized using equimolar mixtures at bases 1 and 2. (Abun(mfaaVAbun(ffaa))5 = 7776. 

If we program the optimum nt distribution and come within 5%. then (Abun(mfaa)/Abun(lfaa))5 = 27 7. The total number 
of different PBDs is unchanged, but the least-favored sequence is about 28 times more abundant. Detecting the least- 
favored amino-acid sequence when varying four residues with equimolar nts at each varied base requires as sensitive 
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a separation system as does detecting the least-favored amino-acid sequence when varying five residues with the 
optimized nt distribution. 

By hypothesis, the distribution "optimal vgCodon" is used in the second version of the second variegation of hy- 
pothetical example 2. The abundance of the DNA encoding each type of amino acid is, however, taken from the Table 
5 1 2. The abundance of DNA encoding the parental amino acid sequence is: 



Amount (parental seq. ) 
10 F24 G30 D34 E42 T47 

= Abun(F) * Abun(G) * Abun(D) * Abun(E) * Abun(T) 

.0249 X .0663 X .0545 X .0602 X .0437 



15 



20 



25 



= 2.4 X 10" 7 

Therefore, DNA encoding the PPBD sequence as well as very many related sequences will be present in sufficient 
quantity to be detected and we are assured that the process will be progressive. 
A level of variegation that allows recovery of the PPBD has two properties: 

1 ) we cannot regress because the PPBD is available, 

2) an enormous number of multiple changes related to the PPBD are available for selection and we are able to 
detect and benefit from these changes. 



The user must adjust the list of residues to be varied and levels of variegation at each residue until the calculated 
variegation is within the bounds set by M ntv and C^sv 

Preferably, we also consider the interactions between the sites of variegation and the surrounding DNA. If the 
method of mutagenesis to be used is replacement of a cassette, we consider whether the variegation will generate 
30 gratuitous restriction sites and whether they seriously interfere with the intended introduction of diversity. We reduce 

or e l iminate g r atu i tous restriction si t es by a p p r o p r i a te choice of variegation patt e rn and sile nt alteration of codons 

neighboring the sites of variegation. See the Detailed Example. 



Sec. 14. 1 : Insertion of synthetic vqDNA into a Plasmids: 

For cassette mutagenesis, restriction sites were designed and synthesized, and are used to introduce the synthetic 
vgDNA into the OCV. Restriction digestions and ligations are performed by standard methods (AUSU87). In the case 
of single-stranded-oligonucieotide-directed mutagenesis, synthetic vgDNA is used to create diversity in the vector 
(BOTS85). 

Sec. 14.2: Transformation of cells: 



The present' invention is not limited to any one method of transforming cells with DNA. Standard methods, such 
as thos described in MANI82, may be optimized for the particular host cells and OCV. The goal is to produce a large 
4S number of independent transformants. preferably 1 0 7 of more. It is not necessary to isolate transformed cells between 
transformation and affinity separation. We prefer to have transformed cells at high concentration so that they can be 
plated densely on relatively few plates. 

Sec. 14.3: Growth of the GP(vqPBD) population: 

so 

The transformed cells are grown first under non-selective conditions that allow expression of plasmid genes and 

then-seleeted-to-kill-untransformed-ce^ 

propriate level of induction, as determined in Sec. 10.1 . The GPs carrying the IPBD are harvested by a method appro- 
priate to the package. 

ss a high level of diversity can be generated by in vitro variegated synthesis of DNA and this diversity can be main- 

tained passively through several generations in an organism without positive selective pressure. Loss or reduction in 
frequency of deleterious mutations is advantageous for the purposes of the present invention. It is preferable that the 
selection is must be performed before more than a few generations elapse. Moreover, subdividing the variegated 
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population before amplification in an organism by removing a small sample (less than 10%) for further work would 
result in loss of diversity; therefore, one should use all or most of the synthetic DNA and most or all of the transformed 
cells. 

5 Sec. 15.: Isolation of GP(PBD)s with bindinq-to-tarqet phenotypes : 

The harvested packages are enriched for the binding-to-target phenotype by use of affinity separation involving 
target material immobilized on a matrix. Packages that fail to bind to target material are washed away If the packages 
are bacteriophage or endospores, it may be desirable to include a bacteriocidal agent, such as azide, in the buffer to 
10 prevent bacterial growth. 

Sec. 15. 1 : Attaching the target material to a column: 

Affinity column chromatography is the preferred method of affinity separation, but other affinity separation methods 
is may be used. A variety of commercially available support materials for affinity chromatography are used. These include 
derivatized beads to which the target material is covalently linked, or non-derivatized material to which the target 
material adheres irreversibly. 

Suppliers of support material for affinity chromatography include: Applied Protein Technologies Cambridge, MA; 
Bio-Rad Laboratories, Rockville Center, NY; Pierce Chemical Company, Rockford, IL. Target materials are attached 
20 to the matrix in accord with the directions of the manufacturer of each matrix preparation with consideration of good 
presentation of the target. 

Sec. 15.2: Reducing selection due to non-specific binding : 

25 We reduce non-specific binding of GP(PBD)s to the matrix that bears the target in two ways: 

1) we treat the column with blocking agents such as genetically defective GPs or a solution of protein before the 
population of GP(vgPBD)s is chromatographed, and 

30 2) we pass the population of GP(vgPBD)s over a matrix containing no target or a different target from the same 

class as the actual target prior to affinity chromatography. 

a 

Step (1) above saturates any non-specific binding that the affinity matrix might show toward wild-type GPs or proteins 
in general; step (2) removes components of our population that exhibit non-specific binding to the matrix or to molecules 

35 of the same class as the target If the target were horse heart myoglobin, for example, a column supporting bovine 
serum albumin could be used to trap GPs exhibiting PBDs with strong non-specific binding to proteins. If cholesterol 
were the target, then a hydrophobic compound, such as p-tertiarybuty I benzyl alcohol, could be used to remove GPs 
displaying PBDs having strong non-specjfic binding to hydrophobic compounds. It is anticipated that PBDs that fail to 
fold or that are prematurely terminated will be non-specificalty sticky. The capacity of the initial column that removes 

40 indiscriminately adhesive PBDs should be greater (e^ 5 fold greater) than the column that supports the target mole- 
cule. 

Variation in the support material (polystyrene, glass, agarose, etc. ) in analysis of clones carrying SBDs is used to 
eliminate enrichment for packages that bind to the support material rather than the target. 

45 Sec. 15.3: Eluting the column: 

The population of GPs is applied to an affinity matrix under conditions compatible with the intended use of the 
binding protein and the population is fractionated by passage of a gradient of some solute over the column. The process 
enriches for PBDs having affinity for the target and for which the affinity for the target is least affected by the eluants used. 

so ions or cofactors needed for stability of PBDs (derived from IPBD) or target must be included in buffers at appro- 

priate levels. We first remove GP(PBD)s that do not bind the target by washing the matrix with the volume of the initial 

buffer-required-tc-bring-the-opto^ — 

The column is then eluted with a gradient of increasing: a) salt, b) [H+] (decreasing pH), c) neutral solutes, d) temper- 
ature (increasing or decreasing), or e) some combination of these factors. Salt is the most preferred solute for gradient 

55 formation. Other solutes that generally weaken non-covalent interaction may also be used. "Salt* includes solutions 
containing any of the following ionic species: 
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15 



35 



Na+ 


K+ 


Ca++ 


Mg++ 


NH 4 + 


Li+ 


Sr++ 


Ba++ 


Rb+ 


Cs+ 


CI- 


Br- 


so 4 - 


HS0 4 - 


po 4 ~ 


HP0 4 - 


H 2 P0 4 - 


co 3 - 


HC0 3 - 


Acetate 


Citrate 


Standard I- Amino Acids 


Standard nucleotides 


Guanidinium CI 



70 Other ionic or neutral solutes may be used. All solutes are subject to the necessity that they not kill the genetic packages. 
Neutral solutes, such as ethanol, acetone, ether, or urea, are frequently used in protein purification, however, many of 
these are very harmful to bacteria and bacteriophage above low concentrations. Bacterial spores, on the other hand, 
are impervious to most neutral solutes. Several passes may be made through the steps in Sec. 15. Different solutes 
may be used in different analyses, salt in one, pH in the next, etc. 

Sec. 15.4: Recovery of packages: 

Recovery of packages that display binding to an affinity column may be achieved in several ways, including from: 

* 

20 1) fractions eluted with a gradient as described above; 

2) fractions eluted with soluble target material, 

3) cells grown in situ on the matrix, 

4) cells incubated with parts of the matrix, 

5) fractions eluted after chemically or enzymatically degrading the linkage holding the target to the matrix, and 
25 6) regeneration of GPs after degrading the packages and recovering OCV DNA. 

It is possible to utilize combinations of these methods. It should be remembered that what we want to recover from the 
affinity matrix is not the GPspjer se, but the information in them. Recovery of viable GPs is very strongly preferred, but 
recovery of genetic material is essential. 
30 Inadvertent inactivation of the GPs is very deleterious. It is preferred that maximum limits for solutes that do not 

i nact i vate the G P s or denature the tar g e t or t he c ol umn ar e det e rm i ned. One may us e conditions that denat u re the 

column to elute GPs; before the target is denatured, a portion of the affinity matrix should be removed for possible use 
as an inoculum. As the GPs are held together by protein-protein interactions and other non-covalent molecular inter- 
actions, there will be cases in which the molecular package will bind so tightly to the target molecules on the affinity 
matrix that the GPs can not be washed off in viable form. This will only occur when very tight binding has been obtained. 
In these cases, methods (3) through (5) above can be used to obtain the bound packages or the genetic messages 
from the affinity matrix 

It is possible, by manipulation of the elution conditions, to isolate SBDs that bind to the target at one pH (pH b ) but 
not at another pH (pH 0 ). The population's applied at pH b and the column is washed thoroughly at pH b . The column is 
40 then eluted with buffer at pH 0 and GPs that come off at the new pH are collected and cultured. Similar procedures may 
be used for other solution parameters, such as temperature. For example, GP(vgPBD)s could be applied to a column 
supporting insulin. After eluting with salt to remove GPs with little or no binding to insulin, we elute with salt and glucose 
to liberate GPs that display PBDs that bind insulin or glucose in a competitive manner. 



45 Sec. 15.5: Amplifying the Enriched Packages 

Viable GPs having the selected binding trait are amplified by culture in a suitable medium, or, in the case of phage, 
infection into a host so cultivated. If the GPs have been inactivated by the chromatography, the OCV carrying the o§£- 
pbd gene must be recovered from the GP, and introduced into a new, viable host. 



so 



55 



Sec. 15.6: Determining whether further enrichment is needed: 



The probability of isolating a GP with improved binding increases by C eff with each separation cycle. Let N be the 
number of distinct amino-acid sequences produced by the variegation. We want to perform K separation cycles before 
attempting to isolate an SBD, where K is such that the probability of isolating a single S8D is 0.10 or higher. 



K = the smallest integer>= log 10 (0.10 N)/log 10 (C efJ ) 
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For example, if N were 1.0x 10 7 and C eff = 6.31 x 10 2 then log 10 (1.0 x 10 6 )/log 1 o(6.31 x 10 2 ) = 6.0000/2.8000 = 2.14. 
Therefore we would attempt to isolate SBDs after the third separation cycle. After only two separation cycles, the 
probability of finding an SBD is (6.31 x 10 2 ) 2 / (1.0 x 10 7 ) = .04 and attempting to isolate SBDs might be profitable. 
Clonal isolates from the last fraction eluted in Sec. 15.3 containing any viable GPs, as well as clonal isolates 

5 obtained by culturing an inoculum taken from the affinity matrix, are cultured. If K separation cycles have been com- 
pleted, samples from a number, a_g. 32. of these clonal isolates are tested for elution properties on the {target} column. 
If none of the isolated, genetically pure GPs show improved binding to target, or if K cycles have not yet been completed, 
then we pool and culture, in a manner similar to the manner set forth in Sec. 14.3, the GPs from the last few fractions 
eluted (see Sec. 15.4) that contained viable GPs and from the GPs obtained by culturing an inoculum taken from the 

10 column matrix. We then repeat the enrichment procedure described in Sec. 15. This cyclic enrichment may continue 
N^,.^ passes or until an SBD is isolated. 

If one or more of the isolated GPs has improved retention on the {target} column, we determine whether the 
retention of the candidate SBDs is due to affinity for the target material. Target material is attached to a different support 
matrix at optimal density and the elution volumes of candidate GP(SBD)s are measured. We pick the candidate that 

is either has the highest elution volume or that is retained on the column after elution. If none of the candidate GP(SBD) 
s has higher elution volume than GP(PPBD of this round), then we pool and culture the GPs from the last few fractions 
that contained viable GPs and the GPs obtained by culturing an inoculum taken from the column matrix. We then repeat 
the enrichment procedure of Sec. 1 5. 

If all of the SBDs show-binding that is superior to PPBD of this round, we pool and culture the GPs from the last 

20 fraction that contains viable GPs and from the inoculum taken from the column. This population is re-chromatographed 
at least one pass to fractionate further the GPs based on K d . 

If an RNA phage were used as GP, the RNA would either be cultured with the assistance of a helper phage or be 
reverse transcribed and the DNA amplified. The amplified DNA could then be sequenced or subcloned into suitable 
plasmids. 

25 

Sec. 15.7: Characterizing the Population: 

We characterize members of the population showing desired binding properties by genetic and biochemical meth- 
ods. We obtain clonal isolates and test these strains by genetic and affinity methods to determine genotype and phe- 
30 notype with respect to binding to target. For several genetically pure isolates that show binding, we demonstrate that 

th«a hlrtftirtg la rausflri hy tha a r tificial c h imeric gene by exc i sing the osp - sbd g e n e and crossing it into the parental GP. 

We also ligate the deleted backbone of each GP from which the osp-sbd is removed and demonstrate that each back- 
bone alone cannot confer binding to the target on the GP. We sequence the osp-sbd gene from several clonal isolates. 

35 Sec. 1 5.8: Testing of binding affinity: 

For one or more clonal isolates, we subclone the sbd gene fragment, without the osp fragment, into an expression 
vector such that each SBD can be produced as a free protein. Each SBD protein is purified by normal means, including 
affinity chromatography. Physical measurements of the strength of binding are then made on each free SBD protein 

40 by one of the following methods: 1) alteration of the Stokes radius as a function of binding of the target material, 
measured by characteristics of elution from a molecular sizing column such as agarose, 2) retention of radiolabeled 
SBD on a spun affinity column to which has been affixed the target material, or 3) retention of radiolabeled target 
material on a spun affinity column to which has been affixed the SBD. The measurements of binding for each free SBD 
are compared to the corresponding measurements of binding for the PPBD. 

45 in each assay, we measure the extent of binding as a function of concentration of each protein, and other relevant 

physical and chemical parameters. 

In addition, the SBD with highest affinity for the target from each round is compared to the best SBD of the previous 
round (IPBD for the first round) and to the IPBD with respect to affinity for the target material. Successive rounds of 
mutagenesis and selection-through-binding yield increasing affinity until desired levels are achieved. 

so |f binding is not yet sufficient, we must decide which residues to vary next (see Sec. 1 6.0). 

Sec-1-5:9: Other-Affinitv-Separ3tion MeansP , 

FACs may be used to separate GPs that bind fluorescent labeled target with the optimized parameters determined 
55 in Part II. We discriminate against artifactual binding to the fluorescent lable by using two or more different dyes, chosen 
to be structurally different. 

Electrophoretic affinity separation uses unaltered target so that only other ions in the buffer can give rise to arti- 
factual binding. Artifactual binding to the gel material gives rise to retardation independent of field direction and so is 
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easily eliminated. A variegated population of GPs will have a variety of charges. 

First the variegated population of GPs is electrophoresed in a gel that contains no target material. The electro- 
phoresis continues until the GPs are distributed along the length of the lane. The target-free lane in which the initial 
electrophoresis is conducted is separated by a removable baffle from a square of gel that contains target material The 
s baffle is removed and a second electrophoresis is conducted at right angles to the first. GPs that do not bind target 
migrate with unaltered mobility while GPs that do bind target will separate from the majority that do not bind target. A 
diagonal line of non-binding GPs will form. This line is excised and discarded. Other parts of the gel are dissolved and 
the GPs cultured. 

10 Sec. 16.0: The Next Variegation Cycle: 

Which residues of the PBD should be varied in the next variegation cycle? The general rule is to preserve as much 
accumulated information as possible. The amino acids just varied are the ones best determined. The environment of 
other residues has changed, so that it is appropriate to vary them again. Because there are always more residues in 

is the principal and secondary sets than can be varied simultaneously, we start by picking residues that either have never 
been varied (highest priority) or that have not been varied for one or more cycles. If we find that varying all the residues 
except those varied in the previous cycle does not allow a high enough level of diversity, then residues varied in the 
previous cycle might be varied again. For example, if the number of independent transformants that can be produced 
and the sensitivity of the affinity separation were such that seven residues could be varied, and if the principal and 

20 secondary sets contained 1 3 residues, we would always vary seven residues, even though that implies varying some 
residue twice in a row. In such cases, we would pick the residues just varied that contain the amino acids of highest 
abundance in the variegated codons used. 

It is the accumulation of information that allows the process to select those protein sequences that produce binding 
between the SBD and the target. Some interfaces between proteins and other molecules involve twenty or more res- 

2S idues. Complete variation of twenty residues would generate 10 26 different proteins. By dividing the residues that lie 
close together in space into overlapping groups of five to seven residues, we can vary a large surface but never need 
to test more than 10 7 to 10 9 candidates at once, a savings of 10 19 to 10 17 fold. 

Having picked the residues to vary, we again set the range of variegation for each residue according to the principles 
set forth in 13.2, design the vgDNA encoding the desired mutants (Sec. 13.3), clone the vgDNA into GPs (Sec. 14), 

30 and select-by-binding-to-target those GPs bearing SBDs (Sec. 15). 



Sec. 17.0: OTHER CONSIDERATIONS: 
Sec. 17.1: Joint selections: 

35 

One may modify the affinity separation of the method described to select a molecule that binds to material A but 
not to material B. One needs to prepare two selection columns, one with material A and the other with material B. The 
population of genetic packages is prepared in the manner described, but before applying the population to A, one 
passes the population over the B column so as to remove those members of the population that have high affinity for 
40 B. It may be necessary to amplify the population that does not bind to B before passing it over A. Amplification would 
most likely be needed if A and B were in some ways similar and the PPBD has been selected for having affinity for A. 

For example, to obtain an S80 that binds A but not B, three columns could be connected in series: a) a column 
supporting some compound, neither A nor B, or only the matrix material, b) a column supporting B, and c) a column 
supporting A. A population of GP(vgPB0)s is applied to the series of columns and the columns are washed with the 
45 buffer of constant ionic strength that is used in the application. The columns are uncoupled, and the third column is 
eluted with a gradient to isolate GP(PBO)s that bind A but not B. 

One can also generate molecules that bind to both A and B. In this case we use a 30 model and mutate one face 
of the molecule in question to get binding to A. We then mutate a different face to produce binding to B. 

The materials A and B could be proteins that differ at only one or a few residues. For example, A could be a natural 
so protein for which the gene has been cloned and B could be a mutant of A that retains the overall 30 structure of A. 
SBOs selected to bind A but not B must bind to A near the residues that are mutated in B. If the mutations were picked 

to-faein theactivesite of-A-(assuming-A-has-an-acti^ 

site of A and is likely to be an inhibitor of A. 

To obtain a protein that will bind to both A and B, we can, alternatively, first obtain an SBO that binds A and a 
ss different SBD that binds B. We can then combine the genes encoding these domains so that a two-domain single- 
polypeptide protein is produced. The fusion protein will have affinity for both A and B. 

One can also generate binding proteins with affinity for both A and B, such that these materials compete for the 
same site on the binding protein. We guarantee competition by overlapping the sites for A and B. We first create a 
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molecule that binds to target material A. We then vary a set of residues defined as: a) those rescues that were varied 
to obtain binding to A. plus b) those residues close in 3D space to the residues of set (a) but that are .nternal and so 
are unlikely to bind directly to either A or B. Residues in set (b) are likely to make small changes in the posibomng of 
the residues in set (a) such that the affinities for A and B will be changed by small amounts. Members of these popu- 
s lations are selected for affinity to both A and B. 

Sac. 17.2: Select ion for non-binding: 

The method of the present invention can be used to select proteins that do not bind to selected targets. Consider 
,o a protein of pharmacological importance, such as streptokinase, that is antigenic to an undesirable extent. We can 
fake thTphanmacologicalV important protein as IPBD and antibodies against it as target. Residues °n he surface of 
he ohaTacZically important protein would be variegated and GP(PBD)s that do not bind to an antibody column 

cultured. Surface residues may be identified in severa. ways including: a) from a 3D structure 
W torn hydrophobic considerations, or c) chemical labeling. The 3D structure of the pharmacological^ mportant 
« IS reSS preferred gukle to picking residues to vary, except now we pick residues that are widely spaced 
so that we leave as little as possible of the original surface unaltered. 

Deslrolg bfnding frequently requires only that a single amino acid in the binding interface be changed. If poly- 
° ab used we face ihe problem that all or most of the strong epitopes must be altered in a single 

mo'cule Seibly one wo^ d h" e a se't of monoclonal antibodies, or a narrow range of antibody species. If we had 
2 o a ?SZ of ^cSaTa^tibody columns, we could obtain one or more mutations that abolish binding to each mono- 

3 3 ^ ^ Z^^^S^^^l ^ pharmacological important P^— e 
been mutated £ a3ZeHmSes binding to the antibodies having maximum ^^^'^f^ 
mpodXrotein The GPs eluting a, the lowest salt are isolated and cultured. The isolated SBD becomes the PPBD 
to further rounds of variegation so that the antigenic determinants are success.vely eliminated. 
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We can select for insertions or deletions that preserve the 3D ^^^^SZ^Z 
CP that exoress BPTI on its surface. Ijj the bpti-osp gene, we can replace the codons for K26 and A27 witn nve 
tLn T 3 2 x 106 seouences) K26andA27 are in a turn and are far from the trypsin bmdmg surface. We 

zxssz^^^s^^ «— - bpti ,hat re,ain high - specif,c affinity ,or tryps,n ' 

fip c 17.4: Created binding protein s not unique: 

Fa, aach tara.1 the.. ar. a large nambar « SBOs that may ba loand by th. mstnod ol tna prasant menttoa To 
J££££ZL PBO in ,h. pop.ia.ion w. bind » tn. W ^TZS^n^ 

S5E£?— — by *a ^.^gs^z^ZZX&X 

demonstrated (SMIT85). . o - _ nx/ nh/fln taraet a larqe plurality of 

Use of different variation schemes can yield different binding proteins. For any gwer , target, a ^arg p y 

one picks subsets to be varied is altered. 
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Sec. 17.5: Other modes of mutagenesis possible: 

The modes of creating diversity in the population of GPs discussed herein are not the only modes possible. Any 
method of mutagenesis that preserves at least a large fraction of the information obtained from one selection and then 
introduces other mutations in the same domain will work. The limiting factors are the number of independent trans- 
formants that can be produced and the amount of enrichment one can achieve through affinity separation. Therefore 
the preferred embodiment uses a method of mutagenesis that focuses mutations into those residues that are most 
likely to affect the binding properties of the PBD and are least likely to destroy the underlying structure of the 1P8D. 

Other modes of mutagenesis might allow other GPs to be considered. For example, the bacteriophage lambda is 
not a useful cloning vehicle for cassette mutagenesis because of the plethora of restriction sites. One can, however, 
use single-stranded-oligo-nt-directed mutagenesis on lambda without the need for unique restriction sites. No one has 
used single-stranded-oligo-nt-directed mutagenesis to introduce the high level of diversity called for in the present 
invention, but if it is possible, such a method would allow use of phage with large genomes. 



Example 1 

BPTI-Derived Binding Protein for HHMb: Displayed by Ml 3 Phage 

Presented below is a hypothetical example of a protocol for developing a new binding molecule derived from BPT1 
with affinity for horse heart myoglobin (HHMb) using the common E cpji bacteriophage M13 as genetic package, it 
will be understood that some further optimization, in accordance with the teachings herein, may be necessary to obtain 
the desired results. Possible modifications in the preferred method are discussed immediately following various steps 

of the hypothetical exampla 

By hypothesis, we set the following technical capabilities: 



DQ 



500 ng/synthesis of ssDNA 100 bases 
long, 

10 ug/synthesis of ssDNA — 60 ba se s long, 



I mg/synthesis of ssDNA 20 bases long. 
M DNA 100 bases 

Y pl l mg/1 

L ef 0.1 % for blunt-blunt, 

4 % for sticky-blunt, 

II % for sticky-sticky. 



M ntv 5 x 10 s 
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c eff 900-fold enrichment 



10 



c sensi 1 in 4 x 10 



8 



N chrom 10 passes 



0.05 



is Example 1 , Part I 

In this example, we wili use M13 as a replicable GP and BPTI as IPBD. In Part I, we are concerned only with 
getting BPTI displayed on the outer surface of an Ml 3 derivative. Variable DNA may be introduced in the osp-ipbd 
gene, but not within the region that codes for the trypsin-binding region of BPTI. Once 8PTI is displayed on the M13 
20 outer surface of an M1 3 derivative, we proceed to Part II to optimize the affinity separation procedures. 

For this example, we choose a filamentous bacteriophage ofE.coli, M 1 3. We prefer phage over vegetative bacterial 
cells because phage are much less metabolically active. We prefer phage over spores because the molecular mech- 
anisms of the virion formation and 3D structure of the virion are much better understood than are the corresponding 
processes of spore formation and structures of spores. 
25 M1 3 is a very well studied bacteriophage, widely used for DNA sequencing and as a genetic vector; it is a typical 

member of the class of filamentous phages. The relevant facts about M1 3 and other phages that will allow us to choose 
among phages are cited in Sec. 1.3.1. 

Compared to other bacteriophage, filamentous phage in general are attractive and M1 3 in particular is especially 

attractive because: 

30 

1) the 3D structure of the virion is known, 

2) the processing of the coat protein is well understood, 
35 3) the genome is expandable, 

4) the genome is small, 

-»»> 

5) the sequence of the genome is known, 

40 

6) the virion is physically resistant to shear, heat, cold, guanidinium CI, low pH, and high salt, 

7) the phage is a sequencing vector so that sequencing is especially easy, and 

is 8) antibiotic-resistance genes have been cloned into the genome with predictable results (HINE80). 

Other criteria listed in Sec. 1.0 and 1.3 of the are also satisfied: M13 is easily cultured and stored (FRIT85), each 
infected cell yielding 100 to 1000 M1 3 progeny after infection. M13 has no unusual or expensive media requirements 
and is easily harvested and concentrated (SALI64, YAMA70, FRIT85). M13 is stable toward physical agents: temper- 
so ature (10% of phage survive 30 minutes at 85 8 C), shear (Waring blender does not kill), desiccation (not applicable), 

radiation (not a p plicable), ag e (stable for years). ■_ 

M1 3 is stable toward chemicals: pH (< 2.2 (SMIT85)). surface active agents: not applicable, chaotropes (guanidin- 
ium HCI = 6.0 M), ions (no specific sensitivities), organic solvents (ether and other organic solvents are lethal 
(MARV78)), proteases (not applicable, HHMb not a protease). M13 is not known to be sensitive to other enzymes. 
ss M13 genome is 6423 b.p. and the sequence is known (SCHA78). Because the genome is small, cassette muta- 

genesis is practical on RF M13 (AUSU87), as is single -stranded oligo-nt directed mutagenesis (FRIT85). M13 is a 
piasmid and transformation system in itself, and an ideal sequencing vector. M13 can be grown on Rec* strains of E. 
coli. The M1 3 genome is expandable (MESS78, FRIT85). M1 3 confers no advantage, but doesn't lyse cells. The se- 
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quence of gene VIII is known, and the amino acid sequence can be encoded on a synthetic gene, using lacUVS promoter 
and used in conjunction with the Lad* repressor. The lacUV5 promoter is induced by IPTG. Gene VIII protein is secreted 
by a well studied process and is cleaved between A23 and A24. Residues 1 8, 21 , 22, and 23 of gene VIII protein control 
cleavage. Mature gene VIII protein makes up the sheath around the circular ssDNA. The 3D structure of f1 virion is 
known at medium resolution; the amino terminus of gene VIII protein is on surface of the virion. No fusions to M13 
gene VIII protein have been reported. The 2D structure of M13 coat protein is implicit in the 3D structure. Mature M13 
gene VIII protein has only one domain. There are four minor proteins: gene III, VI, VII, and IX. Each of these minor 
proteins is present in about 5 copies per virion and is related to morphogenesis or infection. The major coat protein is 
present in more than 2500 copies per virion. 

Although no fusions of M1 3 gene vm to other genes have been reported, knowledge of the virion 3D structure 
(BANN810) makes attachment of IPBD to the amino terminus of mature M13 coat protein (M13 CP) quite attractive. 
Should direct fusion of BPTI to M1 3 CP fail to cause BPTI to be displayed on the surface of M1 3, we will vary part of 
the BPTI sequence and/or insert short random DNA sequences between BPTI and M13 CP. 

Smith (SMIT85) and de la Cruz et aL (CRUZ88) have shown that insertions into gene HI cause novel protein 
domains to appear on the virion outer surface. If BPTI can not be made to appear on the virion outer surface by fusing 
the bpji gene to the m13cp gene, we will fuse bgti to gene III either at the site used by Smith and by de la Cruz et aL 
or to one of the termini. We will use a second, synthetic copy of gene III so that some unaltered gene III protein will be 
present. 

The gene VIII protein is chosen as OSP because it is present in many copies and because its location and orien- 
tation in the virion are known. Note that any uncertainty about the azimuth of the coat protein about its own alpha helical 
axis is unimportant. 

The 3D model of fl indicates strongly that fusing BPTI to the amino terminus of M13 CP is more likely to yield a 
functional protein than any other fusion site. (See Sec. 1 .3.3). 

The amino-acid sequence of M1 3 pre-coat (SCHA78), called AA_seq1 , is 



AA_seql 

1 1 2||2 3 3 4 4 5 

5 0 5 0 — V5 Q § 0 5 a 

>DOCSLVLKASVAVATLVPMLSFAAEGDDPAKAAFNSI^2ASATEYIGYAWA 

5 6 6 7 7 
5 0 5 0 3 
MVWIVGATIGIKLFKKFTSKAS 



The single-letter codes tor amino acids and the codes for ambiguous DNA are internationally recognized (GEOR87). 
The best site for inserting a novel protein domain into M13 CP is after A23 because SPA cleaves the precoat protein 
after A23 as indicated by the arrow. Proteins that can be secreted will appear connected to mature M13 CP at its 
amino temiinus. Because the amino terminus of mature M13 CP is located on the outer surface of the virion, the 
introduced domain will be displayed on the outside of the virion. 

BPTI is chosen as IPBD of this example (See Sec. 2.1) because it meets or exceeds all the criteria: it is a small, 
very stable protein with a well known 3D structure. Marks et aT (MARK86) have shown that a fusion of the fihoA signal 
peptide gene fragment and DNA coding for the mature form of BPTI caused native 8PTI to appear in the periplasm of 
E coli demonstrating that there is nothing in the structure of BPTI to prevent its being secreted. 

Marks et al (MARK87) also showed that the structure of BPTI is stable even to the removal of one of the cystine 

bridges They did this by replacing both C14 and C38 with either two alanines or two threonines. The C14/C38 cyst.ne 
bridge that Marks et al. removed is the one very close to the scissile bond in BPTI; surprisingly, both mutant molecules 
functioned as try psin inhibitors. This indicates, that BPTI is redundantly sta ble and soislikely to fold into approximately 
the same structure despite numerous surface mutations. Using the knowledge of homologues, vide.nf ra rwecaninfer 
which residues must not be varied if the basic BPTI structure is to be maintained. 

The 3D structure of BPTI has been determined at high resolution by X-ray diffraction (HUBE77, MARQ83, WLOD84, 
WLOD873 WLOD87b). neutron diffraction (WLOD84), and by NMR (WAGN87). In one of the X-ray structures depos- 
ited in the Brookhaven Protein Data Bank, "SPIT, there was no electron density for A58, indicating that A58 has no 
uniquely defined conformation. Thus we know that the carboxy group does not make any essential interaction in the 
folded structure. The amino terminus of BPTI is very near to the carboxy terminus. Goldenberg and Creighton reported 
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on circularized BPTl and circularly permuted BPTI (GOLD83). Some proteins homologous to BPTI have more or fewer 
residues at either terminus. 

BPTI has been called "the hydrogen atom of protein folding" and has been the subject of numerous experimental 
and theoretical studies (STAT87, SCHW87, GOLD83. CHAZ83). 
5 BPTI has the added advantage that at least 32 homologous proteins are known, as shown in Table 13. A tally of 

ionizable groups is shown in Table 14 and the composite of amino acid types occurring at each residue is shown in 
Table 1 5. 

BPTI is freely soluble and is not known to bind metal ions. BPTI has no known enzymatic activity. BPTI binds to 
trypsin, K d = 6.0 x 10' 14 M (TSCH87). BPTI is not toxic. If K15 of 8PTI is changed to L. there is no measurable binding 
10 between the mutant BPTI and trypsin (TSCH87). 

All of the conserved residues are buried; of the seven fully conserved residues only G37 has noticeable exposure. 
The solvent accessibility of each residue in BPTI is given in Table 16 which was calculated from the entry "6PTI" in the 
Brookhaven Protein Data Bank with a solvent radius of 1 .4 A, the atomic radii given in Table 7, and the method of Lee 
and Richards (LEEB71 ). Each of the 51 non-conserved residues can accommodate two or more kinds of amino acids. 
is By independently substituting at each residue only those amino acids already observed at that residue, we cou Id obtain 
approximately 7 x 10 42 different amino acid sequences, most of which will fold into structures very similar to BPTI. 

BPTI will be useful as a IPBD for macromolecules. (See Sec. 2.1.1) BPTI and BPTI homologues bind tightly and 
with high specificity to a number of enzymes. 

BPTI is strongly positively charged except at very high pH, thus BPTI is useful as IPBD for targets that are not 
20 also strongly positive under the conditions of intended use (see Sec. 2. 1 .2). There exist homologues of BPTI , however, 
having quite different charges {viz. SCl-lll from Bombvx mori at -7 and the trypsin inhibitor from bovine colostrum at 
-1). Once a derivative of M13 is found that displays BPTI on its surface, the sequence of the BPTI domain can be 
replaced by one of the homologous sequences to produce acidic or neutral IPBDs. 

BPTI is not an enzyme (See Sec. 2.1.3). BPTI is quite small; if this should cause a pharmacological problem, two 
25 or more BPTi-derived domains may be joined as in the human BPTI homologue that has two domains. 

A derivative of M1 3 is the preferred OCV. (See Sec. 3). A "phagemid" is a hybrid between a phage and a plasmid, 
and is used in this invention. Double-stranded plasmid DNA isolated from phagemid-bearing cells is denoted by the 
standard convention, e^. pXY24. Phage prepared from these cells would be designated XY24. Phagemids such as 
Bluescript K/S (sold by Stratagene) are not suitable for our purposes because Bluescript does not contain the full 

-3s ge n om e of M13 and must be r e scued by coinfection with he lp er ph a ge . Such coinfections could lead to genetic re- 

combination yielding heterogeneous phage unsuitable for the purposes of the present invention. 

The bacteriophage M13 bla 61 (ATCC 37039) is derived from wild-type M13 through the insertion of the beta 
lactamase gene (HINE80). This phage contains 8.1 3 kb of DNA. M1 3 blacat 1 (ATCC 37040) is derived from M1 3 bla 
61 through the additional insertion of the chloramphenicol resistance gene (HINE80); M13 bla cat 1 contains 9.88 kb 
35 of DNA. Although neither of these variants of M1 3 contains the ColE1 origin of replication, either could be used as a 
starting point to construct a usable cloning vector for the present example. 

The OCV for the current example is constructed by a process illustrated in Figure 4. A brief description of all the 
plasmids and phagemids constructed fo* this Example is found in Table 17. 

For ss oligo-nt site-directed mutagenesis, multiple primers lead to higher efficiency. Three non-mutagenic primers 
40 are used: bases 2326-2352 of wt M13. bases 4854-4875 of wt M13, and the complement of bases 3431-3451 of 
pBR322 Note that pLG2 and its derivatives carry the anti-sense strand of the am£ fl gene in the + DNA strand. The 
segments are picked to be high in GC content and to divide the pLG7 genome into several segments of approximately 
equal length. 

The genetic engineering procedures needed to construct the OCV are standard, using commercially available 
45 restriction enzymes under recommended conditions. All restriction fragments of DNA are purified by electrophoresis 
or HPLC M13 and its engineered derivatives are infected into E coli strain PE384 (P\ Rec, Sup*, Amp 3 ) . Plasmid 
DNA of M1 3 derivatives is transformed into E. coli strain PE383(F* .Rec.Sup* Amp*) so that we avoid mult.ple rounds 
of infection in the culture. Isolation of M1 3 phage is by the procedure of Salivar et aL (SAL164); isolation of replicative 
form (RF) M13 is by the procedure of Jazwinski et aL (JAZW73a and JA2W73b). Isolation of plasmids containing the 
so ColE1 origin of replication is by the method of Maniatis (MANI82). 

We-pick-the-ampg-gene-f rom.oBR322,as.aA onvenient antibiotic resistance gene. Another resistan ce gene such 

as kanamycin, could be used. The Ace l-to-M II fragment of pBR322 is a conveniently obtained source of any/* and 
the Col E1 origin. 

M 1 3mpl 8 (New England BioLabs) contains neither Aat II nor Ace I sites. Therefore we insert an adaptor that allows 
55 us to insert the Aat ll-to-Acc I fragment of pBR322 that carries the amp.* gene and the ColEl origin of replication into 
a desirable place in M13mp18. M13mpl8 contains a lacUVS promoter and a JacZ gene that are not useful to the 
purposes of the present invention. By cutting M1 3mp1 8 with AVall and Bsu36l and discarding the approximately 600 
intervening base pairs, we eliminate all recognition sites of several enzymes useful for engineering the bpti^gene VIII 
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5 f GACCGACGTCtgcctcGTATACCGGACCGcatagctCC 3' olig#i 
3 1 GCTGCAGacggagCATATGGCCTGGCgtatcgaGGACT 5» olig#2 
Avail 1 Art III (AccIlRsrll [ [Bsu3 6T 

The annealed adaptor is ligated with RF Ml3mpt8 that has been cut with both AVall and 8su36 l and purified by 
PAGE or HPLC. Transformed cells are selected for plasmid uptake with ampicillin. The resulting construct is called 
pLG1. 

DNA from pLG1 is cut with both Aat II and Ace I. Aatll-to-AccI fragment of pBR322 is ligated to the backbone of 
LG 1 . The correct construct is named pLG2. 

The Ace I restriction site is no longer needed for vector construction. To eliminate this site, RF pLG2 dsDNA is cut 
with Ace I, treated with Klenow fragment and dATP and dTTP to make it blunt and then religated. The cloning vector, 
named pLG3, is now ready for stepwise insertion of the osp-ipbd gene. 

We are now ready to design a gene (See Sec. 4) that will cause BPTI-domains to appear on the outer surface of 
an M1 3 derivative: LG7. 

To obtain a novel protein domain attached to the outside of M13, we insert DNA that codes for mature BPTI after 
A23 of the precoat protein of M13. Mature BPTI begins with an arginine residue, which is charged; cleavage by signal 
peptidase I is normal in such cases. Signal peptidase I (SP-I) cuts a chimera of Ml 3 coat protein and BPTI after A23 
leaving mature BPTI attached at its carboxy end to the amino terminus of M1 3 CP 

The following amino-acid sequence, called AA_seq2, is constructed, by inserting the sequence for mature BPTI 
(shown underscored) immediately after the signal sequence of M1 3 precoat protein (indicated by the arrow) and before 
the sequence for the M1 3 CP. 



AA_seq2 

1 1 2 H-2 3 3 4 4 5 

5 0 5 0 V5 0 5 0 5 0 
MKKS LVLKAS VAVATLVPMLS FA RPDFCLEPPYTGPCKARIIRYFYKAKA 



5 6677889 9 10 
5 0 50505050 
GLCQTFVYGGCRAKRNNFKSAEDCMRTCGGAA EGDDPAKAAFNSLQASAT 



10 11 11 12 12 13 
5 0 5 0 5 0 
EYIGYAWAMVWIVGATIGIKLFKKFTSKAS 



Sequence numbers of fusion proteins refer to the fusion, as coded, unless otherwise noted. Thus the alanine that 
begins M13CP is referred to as "number 82", "number 1 of M1 3 CP", or "number 59 of the mature BPTl-M 1 3 CP fusion". 

The osp-ipbd gene is regulated by the lacUVS promoter and terminated by the trPA transcription terminator. The 
host strain of E^coli harbors the lacH gene. The osp-ipbd gene is expressed and processed in parallel with the wild- 
type gene VIM . The novel protein, that consists of BPTI tethered to a M1 3 CP domain, constitutes only a fraction of the 
coat. Affinity separation is able to separate phage carrying only five or six copies of a molecule that has high affinity 
for-an affinity matrix-(SMI-T85)t-V% incorporatjon of-the chimeric protein results in about-3G copies of-the protein-exposed 
on the surface. If this is insufficient, additional copies may be provided by, for example, increasing IPTG. 

A model comprising M13 coat, after the model for fl of Marvin and colleagues (BANN81), and a BPTI domain, 
taken from the Brookhaven Protein Data Bank entry "6PTI", was constructed by standard model building methods that 
insure that covalent bond lengths and angles are close to acceptable values. The model shows that the fusion protein 
could fit into the supramolecular structure in a stereochemical^ acceptable fashion without disturbing the internal struc- 
ture of either the M13 CP or BPTI domain. 
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The ambiguous DNA sequence ceding for AA_seq2, is examined by a computer program for places where rec- 
ognition sites for restriction enzymes could be created without altering the amino-acid sequence. (See Sec. 4.3). A 
master table of enzymes is compiled from the catalogues of enzyme suppliers. The enzymes that do not cut the OCV 
(Preferably constructed as described above). 

Using the procedure given in Sec. 4.3, we design a ipbd gene, such as that shown in Table 25. Some restriction 
enzymes (e.g. Ban I or Hph I) cut the OCV too often to be of value. 

The entire DNA sequence of the m13cp-bpti fusion with annotation appears in Table 25 showing the useful restric- 
tion sites and biologically important features, viz. the lacUV5 promoter, the iacO operator, the Shine-Dalgarno se- 
quence, the amino acid sequence, the stop codons, and the transcriptional terminator. 

The ipbd gene is synthesized in several steps using the method described in Sec. 5. 1 , generating dsONA fragments 
of 1 50 to 1 90 base pairs. 

The four steps (See Sec. 6.1) by which we clone synthetic fragments of the ml 3co-bpti gene (the osp-ipbd gene 
of the present example) into pLG3 and its derivatives are illustrated in Figure 5. 

The sequence to be introduced into pLG3 comprises a) the segment from Rsrll to Avrll (Table 25), b) a spacer 
sequence (gccgctcc), and c) the segment from Asu ll to Sau l. The segment is 153 bases long and is synthesized from 
two shorter synthetic oligo-nts as described in Sec. 5.1 of the generic specification. 

Table 27 shows the antisense strand of the sequence to be inserted. The 99 base fragment shown in upper case 
letters and underscored (5 , -CCGTCC....CCTTCG-3 , = olig#3) is synthesized in the standard manner. Similarly, the 100 
base long fragment of the sense strand shown in lower case (5 , -cgctca....aattg-3' = olig#4) is synthesized. After an- 
nealing, the double-stranded region is extended with Klenow fragment by the procedure given above to make the entire 
176 bases double stranded. The overlap region is 23 base pairs long and contains 14 CG pairs and 9 AT pairs. The 
DNA between Avrll and Asull does not code for anything in the final rjbd gene; it is there so that the DNA can be cut 
by both Avrll and Asull at the same time in the next step. Eight bases have been added to the left of Rsrll and nine 
bases have been added to the left of Sau l (same specificity and cutting pattern as Bsu36 l). These bases at the ends 
are not part of the final product; they must be present so that the restriction enzymes can bind and cut the synthetic 
DNA to produce specific sticky ends. 

The synthetic DNA is cut with both Saul and Rsrll and is ligated to similarly cut dsDNA of pLG3. The construct 
with the correct insert is called pLG4. 

The second step of the construction of the OCV is illustrated in Table 28. As in the construction of pLG4, two pieces 
of single-stranded DNA are synthesized: a 99 base long fragment of the anti-sense strand ending with p25 and a 99 
base long fragment (starting with p18). Both the synthetic dsDNA and dsRF pLG4 DNA are cut with both Avr ll and 
Asu ll and are ligated and used to transform E. coli. The construct carrying this second insert is called pLG5. 

Construction of pLG6 proceeds similarly to the construction of pLG5. The sequence is shown in Table 30. The two 
single stranded segments (one from the anti-sense strand ending with N66 and the other from the sense strand starting 
with the third base of the codon for Y58) are synthesized, annealed, and extended with Klenow fragment. Both the 
synthetic DNA and RF pLG5 are cut with both BssH I and Asu ll, purified, and the appropriate pieces are ligated and 
used to transform E. coli. 

The construction of pLG7 is illustrated in Table 32 and proceeds similarly to the constructions of pLG4, pLG5, and 
pLG6. The two single stranded segments (one from the anti-sense strand ending with the first base of the codon for 
V110 and the other beginning with E101) are synthesized, annealed, and extended with Klenow fragment. Both the 
synthetic DNA and RF pLG6 are cut with both Bbe l and Asu ll, purified, and the appropriate pieces are ligated and 
used to transform E. coli . The construct with the correct fourth insert is called pLG7; the display of BPTI on the outer 
surface of LG7 is verified by the methods of Sec. 8. 

M13am429 is an amber mutation of M13 used to reduce non-specific binding by the affinity matrix for phages 
derived from M13, M13am429 is derived by standard genetic methods (MILL72) from wtM13. 

Phage LG7 is grown on E. coli strain PE384 in LB broth with various concentrations of IPTG added to the medium 
to induce the osp-ipbd gene. Phage LG7 is obtained from cells grown with 0.0, 0.1, 1.0, 10.0 or 100.0 uM, or 1 .0 mM 
IPTG, harvested (See Sec. 7) by the method of Salivar (SALI64), and concentrated to obtain a titre of 10 12 pfu/ml by 
the method of Messing (MESS83). 

The preferred method of determining whether LG7 displays BPTI on its surface (See Sec. 8) is to determine whether 
these phage can retain a labeled derivative of trypsin (trp) or anhydrotrypsin (AHTrp) on a filter that allows passage of 
IJnb^und tip or ARTrpTTrypsin containsTO tyrosine residues and can blTio^ih^teo < ~with~ 1 2S f by standard methods; we 
denote the labeled trypsin as "trp*". Labeled anhydrotrypsin is denoted as "AHTrp*". Other types of labels can be used 
on trp or AHTrp, e.g. biotin or a fluorescent label. AHTrp* or trp* is labeled to an activity of 0.3 uCi/ug. A sample of 
10 12 LG7(10 mM IPTG) is mixed with 1.0 ug of trp* or AHTrp* in 1.0 ml of a buffer of 10 mM KCI, adjusted to pH 8.0 
with 1 mM K 2 HP0 4 / KH 2 P0 4 . The mixture is passed through an Amicon MSP1 system fitted with a membrane filter 
that allows passage of proteins smaller that M r = 300,000. Filters are soaked in buffer containing trp or AHTrp prior to 
the analysis. The filter is washed twice with 0.5 ml of buffer containing trp or AHTrp. The radioactivity retained on the 
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filter is quantitated with a scintillation counter or other suitable device. If each virion displays one copy of BPTI, then 
05 ug of protein can be bound that would give rise to 3 x 10 4 disintegrations / minute on the fitter. 

An alternative way to quantitate display of BPTI on the surface of LG7 is to use the stoichiometric binding between 
trypsin and BPTI to titrate the BPTI. A solution that titers 10 12 pfu/ml of a phage is approximately 1 .6 x 10' 9 M in phage 
s if each virion is infective. The ratio of pfu to total phage can be determined spectrophotometrically using the molar 
extinction coefficients at 260 nm and 280 nm corrected for the increased length of LG7 as compared to wtM13. For 
example, if a 1 .0 ml solution that contains 10 12 pfu of LG7 phage grown with 1.0 mM IPTG inhibits trypsin solutions 
up to 4.8 x 10* 7 M, we calculate that there are approximately 300 BPTIs/GP jlo. (4.3 x 10* 7 molecules of BPTI/1)/(1 .6 
x 10* 9 phage/!)). Inhibition of a specified concentration of trypsin is most easily measured spectrophotometrically using 
10 a peptide-linked dye, such as N a | pha -benzoyl-Arg-Nan (TSCH87). 

Alternatively, binding to an affinity column may be used to demonstrate the presence of BPTI on the surface of 
phage LG7. An affinity column of 2.0 ml total volume having BioRad Affi-Gel 10<™> matrix and 30 mg of AHTrp as 
affinity material is prepared by the method of BioRad. The void volume (V v ) of this column is, by hypothesis, 1.0 ml. 
This affinity column is denoted {AHTrp}. 

is A sample of 10 12 M13am429 is applied to {AHTrp} in 1.0 ml of 10 mM KCI buffered to pH 8.0 with KH 2 P0 4 / 

K 2 HP0 4 . The column is then washed with the same buffer until the optical density at 230 nm of the effluent returns to 
base line or 4 x V v have been passed through the column, whichever comes first. Samples of LG7 or LG10 are then 
applied to the blocked {AHTrp} column at 10 12 pfu/ml in 1 .0 ml of the same buffer. The column is then washed again 
with the same buffer until the optica! density at 280 nm of the effluent returns to base line or 4 x V v have been passed 

20 through, whichever comes first. Following this wash, a gradient of KCI from 1 0 mM to 2 M in 3 x V v , buffered to pH 8.0 
with phosphate is passed over the column. The first KCI gradient is followed by a KCI gradient running from 2 M to 5 
M in 3 x V v . The second KCi gradient is followed by a gradient of guanidinium CI from 0.0 M to 2.0 M in 2 x V v in 5 M 
KCI and buffered to pH 8.0 with phosphate. Fractions of 50 ul are collected and assayed for phage by plating 4 ul of 
each fraction at suitable dilutions on sensitive cells. Retention of phage on the column is indicated by appearance of 

25 LG7 phage in fractions that elute significantly later from the column than control phage LG10 or wtM1 3. A successful 
isolate of LG7 that displays BPTI is identified, the bpji insert and junctions are sequenced, and this isolate is used for 
further work described below. 

If vgDNA is used to obtain a functional fusion between a BPTI mutant and M1 3 CP (vide infra ), then DNA from a 
clonal isolate is sequenced in the regions that were variegated. Then gratuitous restriction sites for useful restriction 

30 enzymes are removed if possible by silent codon changes. The sequence numbers of residues in OSP-IPBD will be 

changed by any Insertions; hereinafter, we will, however, denote residues inserted after residue 23 as 23a t 23b, etc." 

Insertions after residue 81 will be denoted as 81a, 81b, etc. This preserves the numbering of residues between C5 
and C55 of BPTI. Residue C5 of BPTI is always denoted as 28 in the fusion; residue C55 of BPTI is always denoted 
as 78 in the fusion, and the intervening residues have constant numbers. 

3S Should LG7 phage from cells grown with 10 mM IPTG fail to display BPTI on its surface, we have several options. 

We might try to determine why the construction failed to work as expected. There are various possible modes of failure, 
including : a) BPTI is not cleaved from the M13 signal sequence, b) BPTI is cleaved from the M13 CP, and c) the 
chimeric protein is made and cleaved after the signal sequence, but the processed protein is not incorporated into the 
M13 coat. BPTI has been secreted from E. coli (MARK86); however the M13 coat-protein signal sequence was not 

40 used. Therefore problems stemming from the signal sequence are unlikely, but possible. We could determine whether 
BPTI was present in the periplasm or bound to the inner membrane of LG7-inf ected cells by assays using try* or Antry*. 

Proteins in the periplasm can be freed through spheroplast formation using lysozyme and EDTA in a concentrated 
sucrose solution (BIRD67, MALA64). If BPTI were free in the periplasm, it would be found in the supernatant. Try* 
would be mixed with supernatant and passed over a non-denaturing molecular sizing column and the radioactive 

45 fractions collected. The radioactive fractions would then be analyzed by SDS-PAGE and examined for BPTI-sized 
bands by silver staining. 

Spheroplast formation exposes proteins anchored in the inner membrane. Spheroplasts are mixed with AHTrp* 
and then either filtered or centrifuged to separate them from unbound AHTrp*. After washing with hypertonic buffer, 
the spheroplasts are analyzed for extent of AHTrp* binding alternatively, membrane proteins are analyzed by western 
so blot analysis. 

If BPTI is found free in the periplasm, then we would expect that the chimeric protein was being cleaved both 

between-BP-TI-and-the-MI 3 mature coat-sequence and ber/zeen-BP-TI-and-the signal sequence.-! n that case,-we-should- 

alter the BPTI/M1 3 CP junction by inserting vgDNA at codons for residues 78-82 of AA_seq2. 

If BPTI is found attached to the inner membrane, then there are two likely explanations. The first is that the chimeric 
55 protein is being cut after the signal sequence, but is not being incorporated into LG7 virion; the treatment would also 
be to insert vgDNA between residues 78 and 82 of AA_seq2. The alternative hypothesis is that BPTI could fold and 
react with trypsin even if signal sequence is not cleaved. N-terminal amino acid sequencing of trypsin-binding material 
isolated from cell homogenate determines what processing is occurring. If signal sequence were being cleaved, we 
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would use the procedure above to vary residues between C78 and A82; subsequent passes would add residues after 
residue 81 . If signal sequence were not being cleaved, we would vary residues between 23 and 27 of AA_seq2. Sub- 
sequent passes through that process would add residues after 23. 

If BPTI were found neither in the periplasm nor on the inner membrane, then we would expect that the fault was 
5 in the signal sequence or the signal-sequence-to-BPTI junction. The treatment in this case would be to vary residues 
between 23 and 27. 

Several experiments that introduce variegation into the bpti-qene VIII fusion are possible, including: 



1) 


3 variegated codons 


between 


residues 


78 and 82 


usingolig#12and olig#13, 


2) 


3 variegated codons 


between 


residues 


23 and 27 


using olig#1 4 and olig#1 5, 


3) 


5 variegated codons 


between 


residues 


78 and 82 


using olig#1 3 and olig#l 2a, 


4) 


5 variegated codons 


between 


residues 


23 and 27 


using olig# 1 5 and olig#1 4a, 


5) 


7 variegated codons 


between 


residues 


78 and 82 


using olig#1 3 and olig#1 2b, and 


6) 


7 variegated codons 


between 


residues 


23 and 27 


using oiig#1 5 and olig#1 4b. 



20 

To alter the BPTI-M13 CP junction, we introduce DNA variegated at codons for residues between 78 and 82 into 
the Sph I and Sfi I sites of pLG7. The residues after the last cysteine are highly variable in amino acid sequences 
homologous to BPTI, both in composition and length; in Table 25 these residues are denoted as G79, G80, and A81. 
The first part of the M1 3 CP is denoted as A82, E83, and G84. One of the oligo-nts olig#1 2, olig#12a, or olig#1 2b and 
25 the primer olig#1 3 are synthesized by standard methods. The oligo-nts are: 



residue 75 76 77 78 79 80 81 82 83 
5 1 gc | gag | cGC | ATG | CGT | ACC | TGC | qf 3c | qf k | qf k | GCT | GAA | - 



84 85 86 87 88 89 90 91 
GGT|GAT|GAT|CCG|GCC|AAA|GCG|GCC|gcg|cc 3 1 olig#12 



residue 75 76 77 78 79 80 81 81a 81b 
5 • gc | gag | cGC j ATG | CGT | ACC | TGC | qf k | qf k | qf k | qf k | qf k | - 

82 83 *" 84 85 86 87 
GCT | GAA | GGT | GAT | GAT | CCG | - 

88 89 90 91 
GCC | AAA | GCG | GCC | gcg | cc 3' olig#12a 



so 
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residue 75 76 77 78 79 80 81 81a 81b 
5 1 gc | gag | cGC | ATG | CGT | ACC | TGC | qf k | qf k | qf k | qfk j qf k | - 

81c 81d 82 83 84 85 86 87 
qfk | qfk | GCT | GAA | GGT | GAT | GAT | CCG | - 

88 89 90 91 
GCC | AAA | GCG j GCC | gcg | cc 3' olig#12b 



residue 91 90 89 88 87 86 
5' gg | cgc | GGC | CGC | TTT | GGC | CGG | ATC 3» olig#13 



where q is a mixture of (0.26 T ( 0.1 8C, 0.26 A, and 0.30 G), f is a mixture of (0.22 T, 0.16 C, 0.40 A, and 0.22 G), and 
k is a mixture of equal parts 'of T and G. The bases shown in lower case at either end are spacers and are not incor- 
porated into the cloned gene. The primer is complementary to the 3' end of each of the longer oligo-nts. One of the 
variegated oligo-nts and the primer olig#1 3 are combined in equimolar amounts and annealed. The dsDNA is completed 
with all four (nt)TPs and Klenow fragment. The resulting dsDNA and RF pLG7 are cut with both Sfi I and Sp_h I, purified, 
mixed, and ligated. This ligation mixture goes through the process described in Sec. 1 5 in which we select a transformed 
clone that, when induced with IPTG, binds AHTrp. 

To vary the junction between M1 3 signal sequence and BPTI, we introduce DNA variegated at codons for residues 
between 23 and 27 into the Kpn I and Xho I sites of pLG7. The first three residues are highly variable in amino acid 
sequences homologous to BPTI. Homologous sequences also vary in length at the amino terminus. One of the oligo- 
nts olig#14, olig#1 4a, or olig#14b and the primer olig#15 are synthesized by standard methods. The oligo-nts are: 



residue : 17 18 19 20 21 22 23 24 25 

5 1 g | gcc | gcG | GTA | CCG | ATG j CTG | TCT | TTT | GCT | qf k| qfk 

26 27 28 29 30 
|qfk|TTC|TGT|CTC|GAG|cgc|ccg|cga| 3' olig#14 



residue 17 18 19 20 21 22 23 24 25 26 
5 1 g j gcc | gcG | GTA | CCG | ATG | CTG | TCT | TTT | GCT | qfk | qfk | qfk 

26a 26b 27 28 29 30 
|qfk|qfk|TTC|TGT|CTC|GAG|cgc|ccg|cga| 3 1 olig#14a, 



residue 17 18 19 20 21 22 23 24 25 26 
5 1 g | gcc | gcG | GTA | CCG | ATG | CTG | TCT | TTT | GCT | qfk | qfk | qfk | - 

26a 26b 26c 26d , 27, 28 29 30 
tqfktqfk-|-qfkt^ 



teg | egg | gcg | CTC | GAG | ACA | GAA j 3' olig#15 
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15 



where q is a mixture of (0.26 T, 0.18 C, 0.26 A, and 0.30 G), f is a mixture of (0.22 T, 0.16 C, 0.40 A, and 0.22 G), and 
k is a mixture of equal parts of T and G. The bases shown in lower case at either end are spacers. One of the variegated 
oligo-nts and the primer are combined in equimolar amounts and annealed. The ds DNA is completed with all four (nt) 
TPs and Klenow fragment. The resulting dsDNA and RF pLG7 are cut with both Kon I and Xho I, purified, mixed, and 
ligated. This ligation mixture goes through the process described in Sec. 15 in which we select a transformed clone 
that, when induced with IPTG, binds AHTrp or trp. 

If none of these approaches produces a working chimeric protein, we may try a different signal sequence, or a 
different OSP in Ml 3 (e.g., the gene III protein for which there is fusion data (SMIT35, CRUZ88)), or another genetic 
package. 

Example 1, Part II 

BPTI binds very tightly to trypsin (K^ = 6.0 x 10 -14 M) and to anhydrotrypsin, so that these molecules are not 
preferred for optimizing the amount of BPTI to display on LG7 or the amount of affinity molecule to attach to the column. 
Tschesche etaJL reported on the binding of several BPTI derivatives to various proteases: 
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Dissociation constants for BPTI derivatives, Molar. 


Residue #15 


Trypsin (bovine 


Chymotrypsin 


Elastase (porcine 


Elastase (human 




pancreas) 


(bovine pancreas) 


pancreas) 


leukocytes) 


lysine 


6.0 x 10- 4 


9.0 x 10" 9 


m 


3.5 x 10" s 


glycine 






+ 


7.0 x 10" 9 


alanine 






2.8x10- 3 


2.5 x 10-9 


valine 






5.7x10-* 


1.1 x 10- 10 


leucine 




4 


1.9x10- a 


2.9 x 10- 9 
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From the report of Tschesche etai. we infer that molecular pairs marked V have K^s greater than 3.5 x 10" 6 M and 
that molecular pairs marked have KjS much greater than 3.5 x 10* 6 M. Because of the wealth of data about the 
binding of BPTI and various mutants to trypsin and other proteases (TSCH87), we can proceed in various ways. (For 
ppQs wq can obtain two different monoclonal antibodies, one with a high affinity having K d of order 10" 11 M, and 



other 



35 



40 



45 



so 



one with a moderate affinity having K d on the order of 10" 6 M.) In this example, we may use: a) the moderate binding 
between BPTI and human leukocyte elastase (HuLE1), b) the moderately strong binding of porcine elastase to BPTI 
(V15), or c) the binding of BPTI(A15) (residue 38 in the pbd gene) for trypsin (weak but detectable) or for porcine 
pancreatic elastase. 

We compare the retention of LG7 virions to the retention of wild-type Ml 3 on {AHTrp}. M13 derivatives having 
more DNA than wild-type M1 3 have corresponding longer virions. Thus we will create pLG8 that differs from pLG7 only 
in having stop codons at codons 2 and 3, and an altered L codon at codon 7 of the osp-ipbd gene. Phage LG8 will 
have exactly as much DNA as LG7; therefore the LG8 virion is exactly as long as the LG7 virion. LG8 can not, however, 
display BPTI on its surface. 

To expedite identification of different M13-derived phage, we replace the amfi R gene of LG8 with the tet R gene 
from pBR322 by standard methods. The BSM l-to-Aatll tet R bearing fragment of pBR322 is ligated into DNA from pLG8 
cut with Xbal and Aatll. The correct construction, having 9.2 kb r is easily distinguished from p8R322 and is called LG10. 

The phage LG7 is grown at various levels of IPTG in the medium and harvested in the way previously described. 
An affinity column having bed volume of 2.0 ml and supporting an amount of HuLE1 picked from the range 0.1 mg to 
30.0 mg on 1 ml of BioRad Affi-Gel 10<™> or Affi-Gel 15<™) is designated {HuLEl }. An appropriate set of densities of 
HuLE1 on the column is (0.1 mg/ml, 0.5 mg/ml, 2.0 mg/ml, 8.0 mg/ml, 15.0 mg/ml, and 30.0 mg/ml). The V v of {HuLEl} 
is, by hypothesis, 1.0 ml. The elution of LG7 phage is compared to the elution of LG10 on {HuLEl} having varying 
amounts of HuLEl affixed. The columns are eluted in a standard way: 

1 ) 1 0 mM KCI buffered to pH 8.0 with phosphate, until optical density at 280nm falls to base line or 4 x V v , whichever 
is first, . 
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2) a gradient of 10 mM to 2 M KCI in 3 x V v , pH held at 8.0 with phosphate, 

3) a gradient of 2 M to 5 M KCI in 3 x V v . phosphate buffer to pH 8.0, 

4) constant 5 M KCI plus 0 to 0.8 M guanidinium CI in 2 x V v , with phosphate buffer to pH 8.0. 
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The preferred level of induction (IPTG optinia ,) and amount of affinity molecule on the matrix (DoAMoM optimaJ ) are those 
settings that give the sharpest LG7 elution peak that shows significant retardation as compared to LG8, which carries 
no BPTI. By hypothesis, the best separation occurs for the amount of BPTI/GP produced when the cells are induced 
with 10.0 uM IPTG and when 4.0 mg HuLE1/ml is applied to BioRad Affi-Gel 10(™). 
5 When the amount of BPTI/GP and the amount of HuLE1/volume of support have been optimized, we turn to op- 

timization of elution rate, initial ionic strength, and the amount of GP/(volume of support). These parameters can be 
optimized separately. 

Using optimal 8PTI/GP and HuLE1/volume of support, we measure the elution volume of LG7 and LG8 for different 
elution rates, viz. 1, 1/2, 1/4, 1/8 and 1/16 times the maximum flow rate. By hypothesis, 1/4 of maximum elution rate 

10 is better than 1/2, but 1/8 is about the same as 1/4. Therefore 1/4 maximum elution rate will be used. 

Elution volumes of LG7 obtained from cells grown on media that is 2.0 mM in IPTG are measured at optimal 
DoAMoM and elution rate for loadings of 10 9 , 101°, 10". and 10 1 2 pfu. By hypothesis, 10 1 * pfu of pure LG7 overloads 
the column and significant number of phage elute before their characteristic position in the KCI gradient. We also find 
that 10 11 pfu overloads the column only slightly, and that 10 10 pfu does not overload the column. Because the use of 

15 the affinity separation in Sec. 15 will involve a population in which no single member is more than one part in 10 4 , we 
conclude that 10 12 pfu of a variegated population could be applied to a column of 1.0 ml matrix volume without over- 
loading with respect any one species. The overloading of a 1.0 ml column by 10 12 pfu also indicates that the initial 
column that captures indiscriminately adhesive phage should be 5 to 10 times as large as the column that supports 
the target material. 

20 Elution volumes of LG7 and LG10 obtained from cells grown on media that is 2.0 mM in IPTG are measured at 

optimal conditions and for a loading of 10 10 pfu for various initial ionic strengths: 1.0 mM, 5.0 mM, 10.0 mM, 20.0 mM, 
and 50.0 mM. We may find, for example, that LG10 is slightly retarded by the column when loaded at 1 .0 mM KCI, but 
that LG7 always comes off the column at its characteristic place in the gradient. We use 10.0 mM as initial ionic strength 

in all remaining affinity separations. 
25 To determine the sensitivity of chromatography of phage that display variants of BPTI on their surfaces (Sec. 1 0. 1 ), 

we prepare artificial mixtures of two closely-related phage that differ only at one residue in the BPTI domain. One 

variety of phage has strong affinity for the column used in this step, while the other phage has no affinity for the column. 

We chromatograph these mixtures to discover how little of the phage that binds to the column can be detected within 

a large majority of phage that do not bind the column. 
30 For these tests we choose AHTrp as AfM(BPTI). A column having 2 ml bed volume is prepared with (DoAMoM opl _ 
mima| mg of AHT r p)/(ml of Affi-G el 1Qf™>). The column is ca ll ed {AHTrp} and h a s V v = 1 0 m l 



A new phage, LG9, is prepared that displays BPTI(V1 5) as I PBD in contrast to LG7 that displays BPTI(K1 5, wild- 
type) as I PBD. Residue 15 of BPTI is residue 38 of the osp-ipbd gene. We introduce the change K38 to Vby replacement 
of a short segment of the osp-ipbd gene between Apa I & Stu I. The correct construction is called pLG9 . To expedite 
35 differentiation between LG7 and an LG9-derivative phage, we replace the amp* gene of LG9 with the tet« gene from 
pBR322 ONA from pBR322 between Bsml (1353, blunted) and Aatll (1428) is ligated to dsDNA from pLG9 cut wrth 
Xbal (blunted) and Aat>'« The correct construction, having 9.2 kb. is easily distinguished from pBR322 and is called 
LG11. DNA from phage LG11 is sequenced in the vicinity the junctions of the newly inserted tet R gene to confirm the 
construction. 

40 LG7 and LG11 are grown with optimum IPTG (2.0 mM) and harvested. Mixtures are prepared in the ratios 



LG7:LG11 :: 1:V 



lim 



« where V lim ranges from 1 0"> to 1 <fi by factors of 10. Large values of are tested first; once a V Ih) is found that allows 
recovery of LG7, smaller values of V lim are not be tested. 

The column {AHTrp} is first blocked by treatment with 10" virions of M1 3am429 in 100 ul of 10 mM KCI buffered 
to pH 8 0 with phosphate; the column is washed with the same buffer until OD^ returns to base line or 4 x V v have 
passed through the column, whichever comes first. One of the mixtures of LG7 and LG11 containing 10« pfu in 1 ml 

so of the same buffer is applied to {AHTrp}. The column is eluted in a standard way : 

1 ) 1 0 mM KCI buffered t" pH a 0 with phosphate, until optica l densit y at 280nm falls to bas e lin e or 4 x V v . w hichever 



is first, (discard effluent). 

ss 2) a gradient of 10 mM to 2 M KCI in 3 x V v . pH held at 8.0 with phosphate. (30 x 100 ul fractions), 

3) a gradient of 2 M to 5 M KCI in 3 x V v . phosphate buffer to pH 8.0, (30 x 100 ul fractions). 
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4) constant 5 M KCI plus 0 to 0.8 M guanidinium CI in 2 x V v , with phosphate buffer to pH 8.0, (20 x 1 00 ul fractions), 

5) constant 5 M KCI plus 0.8 M guanidinium CI in i .2 x V v , with phosphate buffer to pH 8.0, (1 2 x 1 00 ul fractions). 

5 Samples of 4 ul from each fraction are plated at suitable dilution on phage-sensith/e Sup* cells (so that M1 3am429 will 
not grow). A sample of the column matrix is also used as inoculum for phage-sensitive Sup+ cells. Plaques are trans- 
ferred to ampicillin-containing LB agar, and Amp R colonies are tested for display of BPTI(K1 5) by use of trp* or AHTrp* 
By hypothesis, V Um = 4.0 x 1 0 8 is the largest value for which LG7 can be recovered. Thus C^; = 4.0 x 1 0 8 . Three 
cycles of chromatography are required to isolate LG7, so the first approximation to C e(f is 740 ( = exp( !og e (4.0 x 10 s ) 

io /3 ) ). 

We now determine the efficiency of the affinity separation (Sec. 10.2). This is done by: a) preparing mixtures of 
LG7 and LGII in the ratio 1:Q, b) enriching the population for LG7 for one separation cycle, and c) determining the 
fraction of LG7 in the last phage-bearing fraction. When Q is 1 .5 x 10 4 , 3% of colonies are BPTI positive. When Q is 
1.5 x 10 3 , 60% of the colonies are BPTI positive. Thus we calculate C eff = .60 x 1.5 x 10 3 = 900. 
is Our hypothetical LG7 should display one or more BPTI domains on each virion. The oso-ipbd gene is under control 

of the lacUV5 promoter so that expression levels of BPTI-M13 CP can be manipulated via [IPTGJ. This construct may 
be used to develop many different binding proteins, all based on BPTI. An optimum level of induction and amount of 
AfM(PBD) (= DoAMoM optimum = 2.0 mg/(ml of support)) should have been determined; target molecules will be applied 
to columns in this amount in'the process disclosed in Sec. 15.1. These optimum levels may be adequate for all targets 
20 and all variegations of BPTI displayed on derivatives of M13 based on LG7, but some further optimization may be 
needed if other values of pH or temperatures are used. 

Other pbd gene fragments may be substituted for the bfiti gene fragment in pLG7 with a high likelihood that PSD 
will appear on the surface of the new LG7 derivative. 

25 Example 1, Part III 

HHMb is chosen as a typical protein target; an other protein could be used. HHMb satisfies all of the criteria for a 
target: 1 ) it is large enough to be applied to an affinity matrix, 2) after attachment it is not reactive, and 3) after attachment 
there is sufficient unaltered surface to allow specific binding by PBDs. 
30 The essential information for HHMb is known: 1 ) HHMb is stable at least up to 70°C, between pH 4.4 and 9.3, 2) 

HHMb is stable up to 1.6 M Guan i dinium CI, 3) the p i of HHMb is 7.0, 4) for HHMb, M r = 16,000, 5) I H 1Mb requ i r e s 

haem, 6) HHMb has no proteolytic activity. 

In addition, the following information about HHMb and other myoglobins is available: 1) the sequence of HHMb, 
2) the 3D structure of sperm whale myoglobin (HHMb has 19 amino acid differences and it is generally assumed that 
35 the 3D structures are almost identical), 3) its lack of enzymatic activity, 4) its lack of toxicity. 
We set the specifications of an SBD as : 

.1)T = 25°C 

40 2) pH = 8.0 

3) Acceptable solutes : 

A ) for binding : 

45 

i) phosphate, as buffer, 0 to 20 mM. and 

ii) KCI, 10 mM, 

B ) for column elution : 

so 

i) phosphate, as buffer, 0 to 30 mM. 

ii)-KClrupto-5-Mrand 1 > 

iii) Guanidinium CI, up to 0.8 M. 

55 4) Acceptable K d < 1 .0 x 1 0" 8 M. 

We choose LG7 as GP(IPBD). 

Residues to be varied are picked, in part, through the use of interactive computer graphics to visualize the struc- 
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tures. In this section, all residue numbers refer to BPTI. We pick a set of residues that forms a surface such that ail 
residues can contact one target molecule. Information relevant to choosing BPTI residues to vary includes: 1) the 3D 
structure, 2) solvent accessibility of each residue (LEEB71), 3) a compilation of sequences of other proteins homolo- 
gous to BPTI, and 4) knowledge of the structural nature of different amino acid types. 

5 Tables 16 and 34 indicate which residues of BPTI: a) have substantial surface exposure, and b) are known to 

tolerate other amino acids in other closely related proteins. We use interactive computer graphics to pick sets of eight 
to twenty residues that are exposed and variable and such that all members of one set can touch a molecule of the 
target material at one time. If BPTI has a small amino acid at a given residue, that amino acid may not be able to 
contact the target simultaneously with ail the other residues in the interaction set, but a larger amino acid might well 

10 make contact. A charged amino acid might affect binding without making direct contact. In such cases, the residue 
should be included in the interaction set, with a notation that larger residues might be useful. In a similar way, large 
amino acids near the geometric center of the interaction set may prevent residues on either side of the large central 
residue from making simultaneous contact. If a small amino acid, however, were substituted for the large amino acid, 
then the surface would become flatter and residues on either side could make simultaneous contact. Such a residue 

is should be included in the interaction set with a notation that small amino acids may be useful. 

Table 35 was prepared from standard model parts and shows the maximum span between C beta and the tip of 
each type of side group. is used because it is rigidly attached to the protein maintain; rotation about the C a , pha - 
c t*ta bond ls tne most im P ortant degree of freedom for determining the location of the side group. 

Table 34 indicates five surfaces that meet the given criteria. The first surface comprises the set of residues that 

20 contacts trypsin in the complex of trypsin with BPTI as reported in the Brookhaven Protein Data Bank entry "1TPA". 
This set is indicated by the number "1*. The exposed surface of the residues in this set (taken from Table 16) totals 
1148 A 2 and the approximates the area of contact between BPTI and trypsin. 

Other surfaces, numbered 2 to 5, were picked by first picking one exposed, variable residue and then picking 
neighboring residues until a surface was defined. The choice of sets of residues shown in Table 34 is in no way ex- 

25 haustive or unique; other sets of variable, surface residues can be picked. Hereinafter we refer to K1 5 as being at the 
top of the molecule, while the carboxy and amino termini are at the bottom. 

Solvent accessibilities are useful, easily tabulated indicators of a residue's exposure. Solvent accessibilities must 
be used with some caution; small amino acids are under-represented and large amino acids over-represented. The 
user must consider what the solvent accessibility of a different amino acid would be when substituted into the structure 

30 of BPTI. . 

To create specific bind i ng between a der i vative of B PTI and HI 1 1Mb, w e will vary th e r o oiduos in set #2. This ^et- 

includes the twelve principal residues 17(R). 19(1). 21 (Y), 27(A), 28(G), 29(L), 31(Q) , 32(T), 34 (V), 48(A), 49(E), and 
52 (M) (Sec 13.1 .1). None of the residues in set #2 is completely conserved in the sample of sequences reported in 
Table 34' thus we can vary them with a high probability of retaining the underlying structure. Independent substitution 

35 at each of these twelve residues of the amino acid types observed at that residue would produce approximately 4.4 x 
109 amino acid sequences and the same number of surfaces. 

BPTI is a very basic protein. This property has been used in isolating and purifying BPTI and its homologues so 
that the high frequency of arginine and lysine residues may reflect bias in isolation and is not necessarily required by 
the structure. Indeed. SCI-III from Somc-vx mori contains seven more acidic than basic groups (SASA84). 

40 Residue 17 is highly variable and fully exposed and can contain R. K, A, Y, H, F, L, M, I G, Y, P, or S. All types of 

amino acids are seen: large, small, charged, neutral, and hydrophobic. That no acidic groups are observed may be 

due to bias in the sample. 

Residue 19 is also variable and fully exposed, containing R R, I, S, K, Q, and L 

Residue 21 is not very variable, containing F or Y in 31 of 33 cases and I and W in the remaining cases. The side 
45 group of Y21 fills the space between T32 and the main chain of residues 47 and 48. The OH at the tip of the Y side 
group projects into the solvent. Clearly one can vary the surface by substituting Y or F so that the surface is either 
hydrophobic or hydrophilic in that region. It is also possible that the other aromatic amino ac.diyjz, H) or the other 
hydrophobics (L. M, or V) might be tolerated. 

Residue 27 most often contains A, but S, K. L. and T are also observed. On structural grounds, this residue will 
50 probably tolerate any hydrophilic amino acid and perhaps any amino acid. 

Residue 28 is G in BPTI. This residue is in a turn, but is not in a conformation peculiar to glycine. Six other types 

of - amino -acids-have-been-obse 

contact HHMb simultaneously with residues 17 and 34. Large side groups could interact with HHMb at the same time 
as residues 17 and 34. Charged side groups at this residue could affect binding of HHMb on the surface defined by 

55 the other residues of the principal set. Any amino acid, except perhaps P, should be tolerated. 

Residue 29 is highly variable, most often containing L This fully exposed position will probably tolerate almost any 

amino acid except, perhaps, P. 

Residues 31 . 32, and 34 are highly variable, exposed, and in extended conformations; any ammo acid should be 
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tolerated. 

Residues 48 and 49 are also highly variable and fully exposed, any amino acid should be tolerated. 
Residue 52 is in an alpha helix. Any amino acid, except perhaps R might be tolerated. 

Now we consider possible variation of the secondary set (Sec. 1 3.1 .2) of residues that are in the neighborhood of 
5 the principal set. Neighboring residues that might be varied at later stages include 9(P) , 11(T). 15 (K), 16(A) , 18(1) , 
20(R), 22(F), 24(N), 26(K). 35(Y), 47(S). 50(D), and 53(R). 

Residue 9 is highly variable, extended, and exposed. Residue 9 and residues 48 and 49 are separated by a bulge 
caused by the ascending chain from residue 31 to 34. For residue 9 and residues 48 and 49 to contribute simultaneously 
to binding, either the target must have a groove into which the chain from 31 to 34 can fit, or all three residues (9, 48, 
10 and 49) must have large amino acids that effectively reduce the radius of curvature of the 8PTI derivative. 

Residue 11 is highly variable, extended, and exposed. Residue 11 , like residue 9, is slightly far from the surface 
defined by the principal residues and will contribute to binding in the same circumstances. 

Residue 15 is highly varied. The side group of residue 15 points away form the face defined by set #2. Changes 
of charge at residue 15 could affect binding on the surface defined by residue set #2. 
is Residue 16 is varied but points away from the surface defined by the principal set. Changes in charge at this 

residue could affect binding on the face defined by set #2. 

Residue 18 is I in BPTI. This residue is in an extended conformation and is exposed. Five other amino acids have 
been observed at this residue: M, F, L, V, and T Only T is hydrophilic. The side group points directly away from the 
surface defined by residue set #2. Substitution of charged amino acids at this residue could affect binding at surface 

20 defined by residue set #2. 

Residue 20 is R in BPTI. This residue is in an extended conformation and is exposed. Four other amino acids have 
been observed at this residue: A, S, L, and Q. The side group points directly away from the surface defined by residue 
set #2. Alteration of the charge at this residue could affect binding at surface defined by residue set #2. 

Residue 22 is only slightly varied, being Y, F. or H in 30 of 33 cases. Nevertheless, A, N, and S have been observed 
2S at this residue. Amino acids such as L, M, I, or Q could be tried here. Alterations at residue 22 may affect the mobility 
of residue 21 ; changes in charge at residue 22 could affect binding at the surface defined by residue set #2. 

Residue 24 shows some variation, but probably can not interact with one molecule of the target simultaneously 
with all the residues in the principal set. variation in charge at this residue might have an effect on binding at the surface 
defined by the principal set. 

30 Residue 26 is highly varied and exposed. Changes in charge may affect binding at the surface defined by residue 
set- #2; substitutions may affect th e mob i l i ty of res i d u e 27 th a t i s in the principa l set 



Residue 35 is most often Y, W has been observed. The side group of 35 is buried, but substitution of F or W could 

affect the mobility of residue 34. 

Residue 47 is always T or S in the sequence sample used. The O gamma probably accepts a hydrogen bond from 
3$ the NH of residue 50 in the alpha helix. Nevertheless, there is no overwhelming steric reason to preclude other amino 
acid types at this residue. In particular, other amino acids the side groups of which can accept hydrogen bonds, viz^ 
N, D, Q, and E, may be acceptable here. 

Residue 50 is often an acidic amino acid, but other amino acids are possible. 

Residue 53 is often R, but other amino acids have been observed at this residue. Changes of charge may affect 

40 binding to the amino acids in interaction set #2. 

From published models (HUBE77, WLOD84) one can see that R39 is on the opposite side of BPTI from the surface 
defined by the residues in set #2. Therefore, variation at residue 39 at the same time as variation of some residues in 
set #2 is much less likely to improve binding that occurs along surface #2 than is variation of the other residues in set #2. 
In addition to the twelve principal residues and 1 3 secondary residues, there are two other residues. 30(C) and 

45 33(F), involved in surface #2 that we will probably not vary, at least not until late in the procedure. These residues have 
their side groups buried inside BPTI and are conserved. Changing these residues does not change the surface nearly 
so much as does changing residues in the principal set. These buried, conserved residues do, however, contribute to 
the surface area of surface #2. The surface of residue set #2 is comparable to the area of the trypsin-binding surface. 
Principal residues 1 7, 1 9, 21 . 27, 28, 29, 31 . 32, 34, 48. 49. and 52 have a combined solvent-accessible area of 946.9 

so A 2 . Secondary residues 9, 11, 15, 16, 18, 20, 22, 24, 26, 35, 47, 50, and 53 have combined surface of 1041.7A2. 
Residues 30 and 33 have exposed surface totaling 38.2 A 2 . Thus the three groups' combined surface is 2026.8 A 2 . 

Residue 30 is C in BPTI and is con servedin all homolo g ous sequences. It should be noted, howev er, that C1 4/C38 

is conserved in all natural sequences, yet Marks etaL (MARK87) showed that changing both C14 and C38 to A, A or 
TT yields a functional trypsin inhibitor. Thus it is possible that BPTI-like molecules will fold if C30 is replaced. 

55 Residue 33 is F in BPTI and in all homologous sequences. Visual inspection of the BPTI structure suggests that 

substitution of Y, M, H, or L might be tolerated. 

Given our hypothetical affinity separation sensitivity, C^, we decide to vary six residues leaving some margin 
for errors in the actual base composition of variegated bases. To obtain maximal recognition, we choose residues from 
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the principal set that are as far apart as possible. Table 36 shows the distances between the beta carbons of residues 
in the principal and peripheral set. R17 and V34are at one end of the principal surface. Residues A27, G28, L29, A48, 
E49, and M52 are at the other end, about twenty Angstroms away; of these, we will vary residues 17, 27, 29, 34, and 
48. Residues 28, 49, and 52 will be varied at later rounds. 
5 Of the remaining principal residues, 21 is left to later variations. Among residues 1 9, 31 , and 32, we arbitrarily pick 

1 9 to vary. 

Unlimited variation of six residues produces 6.4 x 10 7 amino acid sequences. By hypothesis, C Mnsi is 1 in 4 x 10 a . 
Table 37 shows the programmed variegation at the chosen residues. The parental sequence is present as 1 part in 
5.5 x 1 0 7 , but the least favored sequences are present at only 1 part in 4.2 x 1 0 9 . Among single-amino-acid substitutions 

10 from the PP8D, the least favored is F1 7-1 1 9-A27-L29-V34-A48 and has a calculated abundance of 1 part in 1 .6 x 1 0 8 . 
Using the optimal qfkcodon, we can recover the parental sequence and all one-amino-acid substitutions to the PPBD 
if actual nt compositions come within 5% of programmed compositions. The number of transformants is M nlv = 1 .0 x 
10 9 (also by hypothesis), thus we will produce most of the programmed sequences. 

The residue numbers above refer to mature BPTl. Since Table 25 refers to the pre-M1 3CP-BPTI protein, all mature 

15 BPTl sequence numbers have been increased by the length of the signal sequence, 23. Thus, we wish to vary residues 
40, 42, 50, 52, 57, and 71 . A DNA subsequence containing all these codons is found between the (Apal) sites at base 
191 and the Sphl site at base 309 of the osp-pbd gene. Among Apal, Drall, and Pssl, Apal is preferred because it 
recognizes six bases without any ambiguity and will cut fewer sequences in the vgDNA. Gratuitous restriction sites 
can be avoided in some cases by use of codon ambiguity: changing the codon for g51 from GGC to GGT makes it 

20 impossible to generate an Apal site at codons 50, 51 , and 6=52. 

Each piece of dsDNA to be synthesized needs six to eight bases added at either end to allow cutting with restriction 
enzymes and is shown in Table 37. The first synthetic base (before cutting with Apal and Schl) is 184 and the last is 
322. There are 142 bases to be synthesized. The center of the piece to the synthesized lies between Q54 and V57. 
The overlap can not include varied bases, so we choose bases 245 to 256 as the overlap that is 12 bases long. Note 

25 that the codon for F56 has been changed to TTC to increase the GC content of the overlap. The amino acids that are 
being varied are marked as X with a plus over them. Codons 57 and 71 are synthesized on the sense (bottom) strand. 
The design calls for "qf k" in the antisense strand, so that the sense strand contains (from 5' to 3') a) equal part C and 
A lie. the complement of k), b) (0.40 T, 0.22 A, 0.22 C, and 0.16 G) (Le. the complement of f), and c) (0.26 T, 0.26 A. 
0.30 C, and 0.18 G). 

30 Each residue that is encoded by "qfk* has 21 possible outcomes, each of the amino acids plus stop. Table 1 2 gives 
tho distribution of amino acids encoded by "qfk", assuming 5% errors. The abundance of the parental sequence is the 

product of the abundances ofRxIxAxLxVxA. The abundance of the least-favored sequence is 1 in 4.2 x 10 9 . 
Olig#27 and olig#28 are annealed and extended with Klenow fragment and all four (nt)TPs. Both the ds synthetic 

DNA and RF pLG7 DNA are cut with both Apa I and Sph I. The cut DNA is purified and the appropriate pieces ligated 
35 (See Sec. 14.1) and used to transform competent PE383. (Sec. 14.2). In order to generate a sufficient number of 

transformants, we start with 5.0 1 of cells. 

1) culture E. colj in 5.0 1 of LB broth at 37°C until cell density reaches 5 x 10 7 to 7 x 10 7 cells/ml, 
40 2) chill on ice for 65 minutes, centrifuge the cell suspension at 4000g for 5 minutes at 4*C, 

3) discard supernatant; resuspend the cells in 1667 ml of an ice-cold, sterile solution of 60 mM CaCI 2 , 

4) chill on ice for 15 minutes, and then centrifuge at 4000g for 5 minutes at 4°C, 

45 

5) resuspend cells in 2 x 400 ml of ice^old, sterile 60 mM CaCI 2 ; store cells at 4 a C for 24 hours, 

6) add DNA (100 u.g) in 20 ml of litigation or TE buffer; mix, inculafe on ice for minutes, 
so 7) distribute into 200 uJ aliquots and heat shock cells at 42°C for 20 seconds, 

fl) add 200 ml LB broth an d incubate at 37°C for 1 hour, 

9) add the culture to 2.0 I of LB broth containing ampicillin at 35-100 ug/ml and culture overnight at 37°C, 

55 

10) after 6 hours, remove 200 ml and plate 0.5 ml portions with log phase JM 107 on LB agar, using the soft-agar 
overlay technique. Phage are prepared from the soft agar, 
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11 ) centrifuge the overnight culture to remove cells, and pellet phage (MESS83), 

12) harvest virions by method of Salivar, et al. (SALI64). 

It is important to: a) use all or nearly all the vgDNA synthesized in ligation, b) use all or nearly all the ligation mixture 
to transform cells, and c) culture all or nearly all the transformants. These measures are directed at maintaining diversity. 

It is important to collect virions in a way that samples ail or nearly all the transformants. Because P cells are used 
in the transformation, multiple infections do not pose a problem in the overnight phage production. P cells are used 

for phage production in agar. 

HHMb has a pi of 7.0 and we carry out chromatography at pH 3.0 so that HHMb is slightly negative while BPTI 
and most of its mutants are positive. HHMb is fixed (Sec. 1 5. 1 ) to a 2.0 ml column on AfR-Gel 1 0<™) or Affi-Gel 1 5<™> 
at 4.0 mg/rnl support matrix, the same density that is optimal for a column supporting trp. 

To remove variants of BPTI with strong, indiscriminate binding for any protein or for the support matrix (Sec. 1 5.2), 
we pass the variegated population of virions over a column that supports bovine serum albumin (BSA) before loading 
the population onto the {HHMb} column. Affi-Gel 10<™> or Affi-Gel 15(™> is used to immobilize BSA at the highest 
level the matrix will support A 10.0 ml column is loaded with 5.0 ml of Affi-Gel-linked-BSA; this column, called {BSA}, 
has V v = 5.0 ml. The variegated population of virions containing 10 12 pfu in 1 ml (0.2 x V v ) of 10 mM KCI, 1 mM 
phosphate, pH 8.0 buffer is applied to {BSA}. We wash {BSA} with 4.5 ml (0.9 x V v ) of 50 mM KCI, 1 mM phosphate, 
pH 3.0 buffer. The wash with' 50 mM salt will elute virions that adhere slightly to BSA but not virions with strong binding. 
The pooled effluent of the {BSA} column is 5.5 ml of approximately 1 3 mM KCI. 

The column {HHMb} is first blocked by treatment with 10 11 virions of M13(am429) in 100 ul of 10 mM KCI buffered 
to pH 8.0 with phosphate; the column is washed with the same buffer until OD 260 returns to base line or 2 x V v have 
passed through the column, whichever comes first. The pooled effluent from {BSA} is added to {HHMb} in 5.5 ml of 13 
mM KCI, 1 mM phosphate, pH 8.0 buffer. The column is eluted (Sec. 15.3) in the following way: 

1 ) 1 0 mM KCI buffered to pH 8.0 with phosphate, until optical density at 280nm falls to base line or 2 x V v , whichever 
is first, (effluent discarded), 

2) a gradient of 10 mM to 2 M KCI in 3 x V v , pH held at 8.0 with phosphate, (30 x 100 uJ fractions), 
3) a gradient of 2 M to 5 M KCI in 3 x V v , phosphate buffer to pH 8.0 (30 x 100 uJ fractions* 

4) constant 5 M KCI plus 0 to 0.8 M guanidinium Clin 2 x V v , with phosphate buffer to pH 8.0, (20 x 1 00 uJ fractions), 
and 

5) constant 5 M KCI plus 0.8 M guanidinium CI in 1 x V v , with phosphate buffer to pH 8.0, (10 x 100 uJ fractions). 

In addition to the elution fractions, a sample is removed from the column and used as an inoculum for phage- 
sensitive Sup* ceils (Sec. 15.4). A sample of 4 uJ from each fraction is plated on phage-sensittve Sup* ceils. Fractions 
that yield too many colonies to count are replated at lower dilution. An approximate titre of each fraction is calculated. 
Starting with the last fraction and working toward the first fraction that was titered, we pool fractions until approximately 
1 09 phage are in the pool, Le, about 1 part in 1 000 of the phage applied to the column. This population is infected into 
3x10" phage-sensitive PE384 in 300 ml of LB broth. The low multiplicity of infection is chosen to reduce the possibility 
of multiple infection. After thirty minutes, viable phage have entered recipient cells but have not yet begun to produce 
new phage. Phage-bom genes are expressed at this phase, and we can add ampicillin that will kill uninfected cells. 
These cells still carry F-pili and will absorb phage helping to prevent multiple infections. 

If multiple infection should pose a problem that cannot be solved by growth at low multiples-infection on F+ cells, 
the following procedure can be employed to obviate the problem, virions obtained from the affinity separation are 
infected into F + E. coli and cultured to amplify the genetic messages (Sec. 15.5). CCC DNA is obtained either by 
harvesting RF DNA or by ]n vitro extension of primers annealed to ss phage DNA. The CCC DNA is used to transform 
P cells at a high ratio of cells to DNA. Individual virions obtained in this way should bear proteins encoded only by the 

DNA" within ~ ' ^ 

The variegation produces as many as 6 A x 1 0? different amino-acid sequences. C eff is 900. Thus, after two sep- 
aration cycles, the probability of isolating a single SBD is less than 0. 1 0; after three cycles, the probability nses above 

The phagemid population is grown and chromatographed three times and then examined for SBDs (Sec. 15.7). 
In each separation cycle, phage from the last three fractions that contain viable phage are pooled with phage obtained 
by removing some of the support matrix as an inoculum. At each cycle, about 10'* phage are loaded onto the column 
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and about 1 0 9 phage are cultured for the next separation cycle. After the third separation cycle, 32 colonies are picked 
from the last fraction that contained viable phage; phage from these colonies are denoted SBD1 , SBD2 and SBD32. 

Each of the SBDs is cultured and tested for retention on a Pep-Tie column supporting HHMb (Sec. 15.8). Phage 
LG7(SBD1 1 ) shows the greatest retention on the Pep-Tie {HHMb} column, eluting at 367 mM KCI while wtM1 3 elutes 
s at 20 mM KCI. SBD11 becomes the parental amino-acid sequence to the second variegation cycle. 

The result of this hypothetical experiment is shown in Table 38. R40 changed to D, 1 42 changed to Q, A50 changed 
to E, L52 remained L, and A71 changed to W. 

The next round of variegation (Sec. 16) is illustrated in Table 39. The residues to be varied are chosen by: a) 
choosing some of the residues in the principal set that were not varied in the first round iyjz. residues 42, 44, 51 , 54, 
10 55, 72, or 75 of the fusion), and b) choosing some residues in the secondary set. Residues 51 , 54, 55, and 72 are 
varied through all twenty amino acids and, unavoidably, stop. Residue 44 is only varied between Y and F. Some residues 
in the secondary set are varied through a restricted range; primarily to allow different charges (+, 0, -) to appear. Residue 
38 is varied through K, R, E. or G. Residue 41 is varied through I, V, K, or E. Residue 43 is varied through R, S. G, N, 
K, 0, E, T, or A. 

75 Olig#29 and olig#30 are synthesized, annealed, extended and cloned into pLG7 at the Apa l/Sgh I sites. The 

ligation mixture is used to transform 5 1 of competent PE383 cells so that 10 9 transformants are obtained. A new 
{HHMb} is constructed using the same support matrix as was used in round 1. A sample of 10 12 of the harvested LG7 
are applied to {HHMb} and affinity separated. The last 10 9 phage off the column and an inoculum are pooled and 
cultured. The cultured phagemids are re^hromatographed for three separation cycles. Thirty-two clonal isolates (de- 

20 noted SBD11-1, SBD11-2,...' SBD11-32) are obtained from the effluent of the third separation cycle and tested for 
binding on a Pep-Tie {HHMb} column. Of this set, SBD11-23 shows the greatest retention on the Pep-Tie {HHMb} 

column, eluting at 692 mM KCI. 

The results of this hypothetical selection is shown in Table 40. Residue 38 (K1 5 of BPTI) changed to E, 41 becomes 

V, 43 goes to N, 44 goes to F, 51 goes to F, 54 goes to S, 55 goes to A, and 72 goes to Q. 
25 ' The sbdl 1-23 portion of the osp-pbd gene is cloned into an expression vector and BPTI(E15, D17, V18, Q1 9, N20, 

F21, E27, F28, L29, S31, A32, S34 ( W71, Q72) is expressed in the periplasm. This protein is isolated by standard 

methods and its binding to HHMb is tested. Kj is found to be 4.5 x 10* 7 M. 

A third round of variation, using SBD11 -23 as PPBD, is illustrated in Table 41 ; eight amino acids are varied. Those 

in the principal set, residues 40, 55, and 57, are varied through all twenty amino acids. Residue 32 is varied through 
30 p, Q, T, K, A, or E. Residue 34 is varied through T, P, Q, K, A, or E. Residue 44 is varied through F, L, Y, C, W, or stop. 

Residue 50 is varied through E, K f orQ. Residue 52 is varied through L, F, I, M, or V. 
The result of this variation is shown in Table 42. The selected SBD is denoted SBD11-23-5 and elutes from a-Pep~ 

Tie {HHMb} column at 980 mM KCI. The sbdl 1-23-5 segment is cloned into an expression vector and BPTl(E9, Q11, 

E 1 5, A1 7, V1 8, Q1 9, N20, W21 , Q27, F28, M29, S31 . L32, H34, W71 , Q72) is produced. This time the Kj is 7.3 x 1 0" 9 M. 
3S This example is hypothetical. It is anticipated that more variegation cycles will be needed to achieve dissociation 

constants of 1 0* 8 M. It is also possible that more than three separation cycles will be needed in some variegation cycles. 

Real DNA chemistry and DNA synthesizers may have larger errors than our hypothetical 5%. If S err > 0.05, then we 

may not be able to vary six residues at once, variation of 5 residues at once is certainly possible. 
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•Table 2: Preferred Outer-Surface Proteins 



w 



15 



20 



25 



Genetic 
Package 



M13 



PhiX174 



Preferred 
Outer-Surface 
Protein 



Reason for preference 



coat protein a) exposed amino terminus, 
(gpVIII) b) predictable post- 

translational 

processing, 
c) numerous copies in 
virion. 



qp II I 



a) fusion data available. 



G protein 



a) known to be on virion 
exterior, 

b) small enough that 
the G-ipbd gene can 

replace H gene. 



30 



E« coli 



LamB 



a) fusion data available, 
b) n o n- ess ential . — 



3S 



40 



43 



B, subtil is 
spores 



CotC 



CotP 



a) no post-translational 
processing, 

b) distinctive sdequence 
that causes protein to 
localize in spore coat, 

c) non-essential, 



Same as for CotC. 



so 



55 



Table 7: 



Atomic radii Angstroms 



_N 



'aJpha 
^cartonyl 
amid* — 



Other atoms 



1.70 
1.52 
—1.55- 
1.80 
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Table 8 



Fraction of DNA molecules having n non-parental bases when reagents that have fraction M of parental nt. 


M 


.9965 


.97716 


.92612 


.8577 


.79433 


630 Qfi 

. < *J\J\J CJvJ 


fO 


9000 


5000 


1000 




nmn 

.UU lU 


.000001 


f1 


.09499 


.35061 


.2393 


.04977 


.00777 


.0000175 


f2 


.00485 


.1188 


.2768 


.1197 


.0292 


.000149 


f3 


.00016 


.0259 


.2061 


.1854 


.0705 


.000812 


f4 


.000004 


.00409 


.1110 


.2077 


.1232 


.003207 


f8 


0. 


2x1 0- 7 


.00096 


.0336 


.1182 


.080165 


f16 


0. 


0. 


0. 


5X10- 7 


.00006 


.027281 


f23 


0. 


0. 


0. 


0. 


0. 


.0000089 


most 


0 

v f 


0 


2 


5 


7 


12 


•most" is the value of n having the highest probability. 
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Table 9: best vgCodon 
Program "Find Optimum vgCodon." 

INITIALIZE-MEMORY-OF-ABUNDANCES 
DO ( tl = 0.21 to 0.31 in steps of 0.01 ) 
. DO ( cl = 0.13 to 0.23 in steps of 0.01 ) 
. . DO ( al = 0.23 to 0.33 in steps of 0.01 ) 
Comment calculate gl from other concentrations 
. . . gl = 1.0 - tl - cl - al 
. . . IF( gl .ge. 0.15 ) 

. . . . DO ( a2 » 0.37 to 0.50 in steps of 0.01 ) 
'DO ( c2 = 0.12 to 0.20 in steps of 0.01 ) 

Comment Force D+E = R + K 

g2 = (gl*a2 5*al*a2)/ (cl+0 . 5*al) 

Comment Calc t2 from other concentrations. 

t2 = 1. - a2 - c2 - g2 

IF(g2.gt. O.l.and. t2.gt.0.1) 

CALCULATE -ABUNDANCES 

COMPARE -ABUNDANCES -TO- PREVIOUS -ONES 

— i — i — ; — ; — ; — i~r end_I F_b lock 

end_DO_loop i c2 

end_DO_loop ! a2 

end_IF_block i if gl big enough 

. . . . end_DO_loop I al 
. . . end_DO_loop" ! cl 
• . end_DO_loop ! tl 

WRITE the best distribution and the abundances. 
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Table 11: Calculate worst codon. 



Program "Find worst vgCodon within Serr of given 

distribution. " 

INITIALIZE-MEMORY-OF-ABUNDANCES 
Comment Serr is % error level. 
READ Serr 

Comment Tli,Cli, Ali,Gli, T2i ,C2i, A2i,G2i, T3i,G3i 
Comment are the intended nt-distribution . 

READ Tli, Cli, Ali, Gli 

READ T2i, C2i, A2i, G2i 

READ T3i', G3i 

Fdwn = 1 . -Serr 

Fup = l.+Serr 

DO ( tl = Tli*Fdwn to Tli*Fup in 7 steps) 
. DO ( cl = Cli*Fdwn to Cli*Fup in 7 steps) 
. . DO ( al = Ali*Fdwn to Ali*Fup in 7 steps) 
gl = 1. - tl - cl - al 
IF( (gl-Gli)/Gli .It. -Serr) 



tent 



Comment 



gl too far beluw Gli, p ush it back 
. gl » Gli*Fdwn 

. factor = (l.-gl)/(tl + cl + al) 

. tl = tl* factor 

. cl = cl*factor 

. al - al* factor 

. .end_IF_block 

IF( (gl-Gli)/Gli .gt. Serr) 

gl too far above Gli, push it back 

. gl = Gli*Fup 

. factor = (l.-gl)/(tl + cl + al) 
. tl = tl*factor 
. cl = cl*factor 
. al = al*factor 



end IF block 
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Table 11, continued. 

. - . DO ( a2 = A2i*Fdwn to A2i*Fup in 7 steps) 

Table 11, continued. 

• . . . DO ( c2 = C2i*Fdvn to C2i*Fup in 7 steps) 

DO (g2=G2i*Fdwn to G2i*Fup in 7 steps) 

Comment Calc t2 from other concentrations. 

t2 = 1. - a2 - c2 - g2 

IF( (t2-T2i)/T2i .It. -Serr) 

Comment t2 too far below T2i, push it back 

t2 = T2i*Fdwn 

factor = (l.-t2)/(a2 + c2 + g2) 

a2 = a2*f actor 

c2 = c2*f actor 

g2 - g2*f actor 

end__IFJblock 

• IF( (t2-T2i)/T2i .gt. Serr) 

Comment t2 too far above T2i, push it back 

» — . — — — * — = — . t2 =■ T2i*Fup 

factor « (l.-t2)/(a2 + c2 + g2) 

Table 11, continued. 

a2 = a2*f actor 

<22 = c2*f actor 

g2 =» g2* factor 

end_IF__block 

IF(g2.gt. 0,0 .and. t2.gt.0.0) 

t3 = 0.5* (1. -Serr) 

g3 » 1. - t3 

CALCULATE -ABUNDANCES 

COMPARE -ABUNDANCES - TO - PRE VI OUS -ONES 

t3 r 0.5 

__________ _____________ 
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Table 11, continued. 
CALCULATE -ABUNDANCES 

COMPARE-ABUNDANCES -TO-PREVIOUS -ONES 

t3 = 0.5*(1.+Serr) 

g3 = 1. - t3 

CALCULATE -ABUNDANCES 

Table 11, continued- 

COMPARE -ABUNDANCES -TO-PREVIOUS -ONES 

end_IF_block 

. . end_DO_loop I g2 

end_DO_loop ! c2 

end_DO_loop ! a2 

. . . . end_DO_loop i al 
. • . end_DO_loop 1 cl 
. . end_DO_loop ! tl 

WRITE the WORST distribution and the abundances. 
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Table 12: Abundances obtained 
using optimum vgCodon assuming 

5% errors 



Amino Amino 



acid 


Abundance 


acid 


Abundance 


A 


4.59% 


c 


2.76% 


D 


5,45% 


E 


6.02% 


F 


2.49% lfaa 


G 


6.63% 


H 


3.59% 


I 


2.71% 


K 


5.73% 


L 


6.71% 


M 


3.00% 


N 


5.19% 


P 


3.02% 


Q 


3 .97% 


R 


7 . 68% mfaa 


S 


7.01% 


T 


4.37% 


V 


6.00% 


W 


3.05% 


Y 


4 .77% 


stoo 


5.27% 







ratio = Abun(F)/Abun(R) » 0.3248 



* . 

1 ri/ratio) 1 ( ratio) 3 stop-free 

1 3.079 .3248 .9473 

2 9.481 .1055 .8973 

3 29 . 193 . 03 4 25 — .8500 » 

4 89.888 .01112 .8052 

5 276.78 3.61 X 10" 3 .7627 

6 852.22 1.17 X 10~ 3 .7225 

7 2624.1 3.81 X 10~ 4 .6844 
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Table 13: BPTI Homologues 



w 



is 



20 



2S 



30 



35 



40 



45 



SO 



R # 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 

•x • 


18 


19 
j» J? 


-3 


































XJ 








_ 


_ 
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Q 
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w 
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-1 
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T 


E 






_ 






_ T 


P 

X 


_ 


_ 
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Table 13, continued. 
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R # - residue number 



1 BPTI 

2 Engineered BPTI From MARK87 

3 Engineered BPTI From HARKS 7 
30 4 Bovine Colostrum (DUFT85) 
5 Bovine Serum (DUFT85) 



6 Semisynthetic BPTI, TSCH87 

7 Semisynthetic BPTI, TSCH87 

8 Semisynthetic BPTI, TSCH87 
as 9 Semisynthetic BPTI, TSCH87 

10 Semisynthetic BPTI , TSCH87 

11 Engineered BPTI, AOER87 

12 Dendroasp is polvle pis polvlepis (Black mamba) venom I 
(DUFT85) 

4Q 13 Dendroaspis polvlepis polvlepis (Black Mamba) venom K 

(DUFT85) 

14 Hemachatus hemachates (Ringhals Cobra) HKV II 
(DUFT85) 

15 Nana nivea (Cape cobra) NNV II (DUFT85) 

16 Vipera russelli (Russel's viper) RW II (TAKA74) 
45 17 Red sea turtle egg white (DUFT85) 

18 Snail mucus ( Helix pomania ) (WAGN78) 

19 Dendroaspis ancrusticeps (Eastern green mamba) 
C13 SI C3 toxin (DUFT85) 

so 
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Table 13, continued. 

R # 20 21 22 23 24 25 26 27 28 29 30 31 32 33 

-5 - _- -- -- -- -- -- D 

-4 -------------E 

. 3 ------------TP 

-2 Z-LZRK---RR-ET 

-1 P-QDDN---QK-RT 

1 RRHHRRIKTRRRGD 

2 RPRPPPNEVHHPFL 

3 KYTKKTGDARPDLP 

4 LAFFFFDSADDFDI 

5 CCCCCCCCCCCCCC 

6 IEKYYNEQNDDLTE 

7 LLLLLLLLLKKESQ 

8 HIPPPLPGPPPPPA 

9 RVAAAPKYVPPPPFG 

10 N'AEDDEVS I DDYVD 

11 PAPPPTVARKTTTA 

12 GGGGGGGGGGKGGG 

13 RPPRRRPPPNIPPL 

14 CCCCCCCCCCCCCC 

15 YMKKLNRMR--KRF 

16 DFAAAAAGAGQAAG 

17 KFSHYLRMFPTKGY 

18 IIIIMIFTIVVMFM 

19 PSPPPPPSQRRIKK 
30 20 AAARRARRLAARRL 

21 FFFFFFYYWFFYYY 
22 Y — Y Y — Y Y — Y — Y F — A Y Y F N S 

23 YYYYYYYYFYYYYY 

24 NSNDNNNNDDKNNN 
35 25 QKWS PSSGATPATQ 

26 KGAAAHSTVRS KRE 

27 KAASSLSSKLAATT 

28 KNKNNHKMGKKGKK 

29 QKK-KKKRAKTRFQN 

30 CCCCCCCCCCCCCC 

31 EYQNEQEEVKVEEE 

32 RPLKKKKTLAQTPE 

33 FFFFFFFFFFFFFF 

34 DTHIINIQPQRVKI 

35 WYYYYYYYYYYYYY 
.36 SSGGGGGGGRGGGG 

37 GGGGGGGGGGGGGG 

38 CCCCCCCCCCCCCC 

39 GRKPRGGMQDDKKQ 

40 GGGGGGGGGGGAGG 

41 NNNNNNNNNDDKNN 
4-2 S — A— A- — A- 1 . — A — A — A — G — G — H__H_ _G D_ 

43 NNNNNNNNNGGNNN 
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Table 13, continued. 
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20 Dendroaspis anausticeps (Eastern Green 
Mamba) C13 S2 C3 toxin (DUFT85) 

21 Dendroaspis polvlepis polvlepes (Black 
mamba) B toxin (DUFT85) 

22 Dendroaspis polvlepis polvieoes (Black 
Mamba) E toxin (DUFT85) 

23 Vioera ammodvtes TI toxin (DUFT85) 

24 Vipera ammodvtes CTI toxin (DUFT8 5) 

25 Bunaarus fasciatus VIII B toxin (DUFT85) 

26 Anemonia sulcata (sea anemone) 5 II 
bUFT85 ) ; 

27 Homo sapiens HI-14 "inactive" domain 

(DUFT85) 

28 Homo sapiens HI-14 "active" domain 
(DUFT85) 

29 beta bungarotoxin Bl (DUFT85) 

30 beta bungarotoxin B2 (DUFT8 5) 

31 Bovine spleen TI II (FIOR85) 
32" Tachvoleus tridentatus (Horseshoe crab) 

hemocyte inhibitor (NAKA87) 
33 Rombvx mori (silkworm) SCI-III (SASA84) 

Notes : 

a) both beta bungarotoxins have residue 15 deleted. 

b) B. mori has an extra residue between C5 and C14 ; we 
have assigned F and G to residue 9. 

c) all natural proteins have C at 5, 14, 30, 38, 50, & 55. 

d) all homologues have F33 and G37. 

e) extra C's in bungarotoxins form interchain cystine 
bridges 
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Table 14: 



Tally of lonizable Groups. BPTI homologues. 
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Sequences given in Table 10. 
+ is sum ofK + R + NH-D-E- C02, approximate charge on molecule at pH 7.0. 
# is sum ofK + R + NH + D + E+ C02, i.e. number of ionized groups at pH 7.0. 
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75 



20 



25 



Table 15: Amino acids observed at each Residue 

BPTI homologies 

s Number 

Different 

Res. $ AAs Contents BPTI 

-5 2D -32 

-4 2 E -32 

10 -3 5 T P F 2 -29 

-2 10 23 R3 Q2 T2 H G L K E -18 

-1 10 D4 T2 P2 Q2 E G N K R -13 

1 10 R21 A2 K2 H2 P L I T G D R 

2 9 P20 R4 A2 H2 N E V F L P 

3 10 D15 K6 T3 R2 P2 S Y G A L D 

4 7 F19 D4 L3 Y2 12 A2 S F 

5 1 C33 C 

6 10 Lll E5 N4 K3 Q2 12 Y2 D2 T R L 

7 5 L18 Ell K2 S Q E 

8 7 P26 H2 A2 I L G F P 

9 ' 9 P17 A6 V3 R2 Q L K Y F P 

10 10 Yll E7 D4 A2 N2 R2 V2 S I D Y 

11 10 T17 P5 A3 R2 I S Q Y V K T 

12 2 G32 K G 

13 5 P22 R6 L3 N I P 

14 3 C31 T A C 

15 12 K15 R4 Y2 M2 L2 -2 V G A I N F K 

16 7 A22 G5 Q2 R K D F A 

17 12 R12 K5 A2 Y3 H2 S2 F2 L M T G P R 

18 6 121 M4 F3 L2 V2 T I 

19 7 111 P10 R6 S2 K2 L Q I 

20 5 R19 A7 S4 L2 Q R 
24 4 Y18 F13 W I Y 

22 6 F14 Y14 H2 A N S F 

23 2 Y32 F * 
3S 24 4 N26 K3 D3 S N 

25 10 A12 S5 Q3 P3 W3 L2 T2 K G R A 

26 9 K16 A6 T2 E2 S2 R2 G H V K 

27 5 A18 S8 K3 L2 T2 A 

28 7 G13 K10 N5 Q2 R H M G 

29 10 ~ L9 Q7 K7 A2 F2 R2 M G T N L 

30 1 C33 C 

31 7 Q12 Ell L4 K2 V2 Y N Q 
3 2 11 T12 P5 K4 Q3 E2 L2 G V S R A T 

33 1 F33 F 

34 11 Vll 18 T3 D2 N2 Q2 F H P R K V 

35 2 Y31 W2 V 

36 3 G27 S5 R G 

37 1 G33 G 

38 3 C31 T A C 
3 9 7 R13 G9 K4 Q3 D2 P M R 



30 



40 



45 



SO 
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Table 15: continued. 

Number 
Different 

AAs Contents 

2 G22 All 

3 N20 Kll D2 

9 All R9 S4 G3 H2 D Q K N 

2 N31 G2 

3 N21 Rll K 
2 F32 Y 

8 K24 E2 S2 D H V Y R 

2 T19 S14 

9 All 19 E4 T2 W2 L2 R K D 
7 E19 D6 A2 Q2 K2 T H 

6 E16 D12 L2 M Q K 
1 C33 

7 R13 M10 L3 E3 Q2 H V 

8 R21 Q3 E2 H2 C2 G K D 

7 T23 A3 V2 E2 I Y K 

1 C33 

8 G15 V8 13 E2 R2 A L S 
8 G19 V4 A3 P2 -2 R L N 

8 All -10 P3 K3 S2 Y2 R F 

9 -24 G2QEAYSPR 
6 -28 Q R I G D 

3 -31 T P 

2 -32 D 
2 -32 K 

2 -32 S 
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Table 16: Exposure in BPTI 
Coordinates taken from 

Brookhaven Protein Data Bank entry 6PTI. 

HEADER PROTEINASE INHIBITOR (TRYPSIN) 13-MAY-87 6PTI 
COMPND BOVINE PANCREATIC TRYPSIN INHIBITOR 
COMPND 2 (/3PTI$ , CRYSTAL FORM /III?) 
AUTHOR A . WLO DAWER 

Solvent radius = 1.40 
Atomic radii given in Table 7 

Areas in Angstroms-squared. 

Not Not 
Total Covered covered 
Residue area by M/C fraction at all fraction 

ARG 1 342.45 205.09 0.5989 152.49 0.4453 

PRO 2 239.12 92.65 0.3875 47.56 0.1989 

ASP 3 272.39 158.77 0.5829 143.23 0.5258 

PHE 4 311.33 137.82 0.4427 43.21 0.1388 

CYS 5 241.06 48.36 0.2006 0.23 0.0010 

LEU 6 280.93 151.45 0.5390 115.37 0.4124 

GLU 7 291.39 128.91 0.4424 90.39 0.3102 

PRO 8 236.12 128.71 0.5451 99.98 0.4234 

PRO 9 236.09 109.82 0.4652 45.80 0.1940 

TYR 10 330.97 153.63 0.4642 79.49 0.2402 

THR 11 249.20 80.10 0.3214 64.99 0.2608 

GLY 12 184.21 56.75 0.3081 23.05 0.1252 



PRO 13 240.0 7 130.25 0.5426 7 5.27 0.3136 

CYS 14 237.10 75.55 0.3186 53.52 0.2257 

LYS 15 310.77 200.25 0.6444 192.00 0.6178 

ALA 16 209.41 66.63 0.3182 45.59 0.2177 

ARG 17 351.09 243.67 0.6940 201.48 0.5739 

ILE 18 277.10 100.51 0.3627 58.95 0.2127 

ILE 19 278.03 146.06 0.5254 96.05 0.3455 

ARG 20 339.11 144.65 0.4266 43.81 0.1292 

TYR 21 333.60 102.24 0.3065 69.67 0.2089 

PHE 22 306.08 70.64 0.2308 23.01 0.0752 

TYR 23 338.66 77.05 0.2275 17.34 0.0512 

ASN 24 264.88 99.03 0.3739 38.69 0.1461 

ALA 25 211.15 85.13 0-4032 48.20 0.2283 

LYS 26 313.29 216.14 0.6899 202.84 0.6474 

ALA 27 210.66 96.05 0.4560 54.78 0.2601 

GLY 28 186.83 71.52 0.3828 32.09 0.1718 

LEU 29 280.70 132.42 0.4718 93.61 0.3335 

CYS 30 238.15 57.27 0.2405 19.33 0.0812 

GLN 31 301.15 141.80 0.4709 82.64 0.2744 

THR 32 251.26 138.17 0.5499 76.47 0.3043 
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Table 16, continued. 



PH£ 


J J 


7 fl4 


27 


59 


. 79 


0. 


1965 


VAJj 


t A 


7 R 1 


56 


109 


. 78 


0 . 


4364 


TYR 


J D 


j J« 


» o *t 




52 


0 . 


2421 


▼ \f 
GLY 


3 O 






J. J. 


90 


0 . 


0636 


GLY 


*> "7 


1 ft ^ 


9 ft 


84 


26 


0 . 


4548 


CYS 


J O 




R6 


73 


64 


0 . 


3139 


ARG 




ill 7 
** 1 / 


1 -I 


-> w "T 


62 


0 . 


7303 


■» i* » 
ALA 


40 




C "7 
• 3 -> 


Q4 


01 


0 . 


4487 


LYS 


A 1 

41 




fin 


1 u u 




0 . 


5284 


ARG 


42 




. u o 


A W A 




0 . 


6670 


ASN 


43 




47 






0 . 


1446 


ASN 


44 


-> £ Q 


fi ^ 




OS 


o 


3378 


PHE 


45 


<ji ^ 
j u 




6Q 


73 


0 . 


2226 


LYS 


4 o 




ft 1 


717 


18 


0 . 


7010 




A 7 


0 0 A 


.78 


69 


.11 


0. 


3075 


ALA 


48 


211 


.01 


82 


.06 


0. 


3889 


GLU 


49 


286 


.62 


161 


.00 


0. 


5617 


ASP 


50 


299 


.53 


156 


.42 


0. 


5222 


CYS 


51 


238 


.68 


24 


.51 


0. 


1027 


MET 


52 


293 


. 05 


89 


.48 


0. 


3054 


ARG 


53 


356 


.20 


224 


.61 


0, 


.6306 


THR 


54 


251 


.53 


116 


.43 


0. 


,4629 


CYS 


55 


240 


.40 


69 


.95 


0. 


.2910 


GLY 


56 


184 


.66 


60 


.79 


0. 


.3292 


GLY 


57 


106 


.58 


49 


.71 


0. 


.4664 


ALA 


58 


no 


position given 


in P 



18.91 0.0622 

42.36 0.1684 

15.05 0.0452 

1.97 0.0105 

39.17 0.2114 

26.40 0.1125 

250.73 0.6011 

52.95 0.2527 
108.77 0.3457 
179.59 0.5145 

5.32 0.0200 

23.39 0.0867 

14.79 0.0472 
155.73 0.5026 

24.80 0.1103 
31.07 0.1473 

100.01 0.3489 

95.96 0.3204 
0.00 0.0000 

66.70 0.2276 

189.75 0.5327 

51.64 0.2053 

0.00 0.0000 

32.78 0.1775 

38.28 0.3592 



"Total area" 



"Not covered 
by M/C" 



"Not covered 
at all" 



measured by a roll i ng sphere 



^rs- — 

of radius 1.4 A, where only the atoms 
within the residue are considered. This 
takes account of conformation. 

is the area measured by a rolling sphere 
of radius 1.4 A where all main-chain atoms 
are considered, fraction is the exposed 
area divided by the total area. Surface 
buried by main-chain atoms is more 
definitely covered than is surface covered 
by side group atoms. 

is the area measured by a rolling sphere 
of radius 1.4 A where all atoms of the 
protein are considered. 



Table 17: 



Plasmids used in Detailed Example 


Phage 


Contents 


LG1 


M1 3mp18 with Ava ll/Aat ll/Acc l/Rsr tl/Sau I adaptor 



88 



EP0 436 597 B1 



Table 17: (continued) 





Plasmids used in Detailed Example 


Phaae 


Contents 


5 


pLG2 


LG1 with amp R and ColEl of p8R322 cloned into Aat ll/Acc I sites 




pLG3 


pLG2 with Ace 1 site removed 




pLG4 


pLG3 with first part of osp-pbd qene cloned into Rsr ll/Sau 1 sites, Avr II/Asu II sites created 




pLG5 


pLG4 with second part of osp-pbd gene cloned into Avr II/Asu II sites, BssH 1 site created 


10 


pLG6 


pLG5 with third part of osp-pbd qene cloned into Asu 11/BssH 1 sites, Bbe 1 site created 


pLG7 


pLG6 with last part of osp-pbd qene cloned into Bbe l/Asu II sites 




pLG8 


pLG7 with disabled osp-pbd qen8, same lenqth DNA. 




pLG9 


pLG7 mutated to display BPTI(V15 BPT! ) 




pLGlO 


pLG8 + tet R gene - amp* gene 


15 


pLG11 


pLG9 + tet R gene - amp n gene 



20 



25 



30 



35 



40 



45 



SO 
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Table 25: Annotated Sequence of iobd gene 



5 ' - C | GGA | CCG | TAT | CCA | GGC | TTT j ACA | CTT | TAT | 

28 

Rsr II I I -35 



| GCT | TCC | GGC | TCG | TAT | AAT | GTG | TGG | 52 



| AAT | TGT | GAG | CGG | ATA | ACA | ATT | 73 
| lac operator L 



| CCT | AGG | AGG | CTC | ACT 
| Avr III 

S- D. 



88 



|m|k|k|s|l|v|l|k|a|s| 
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 
| ATG | AAG | AAA | TCT | CTG | GTT | CTT | AAG | GCT | AGC | 

| A£l TT| Nhe I I 



|vja|v|a|t|l|v|p|m|l| 

| 11| 12 | 13 | 14 | 15 | 16| 17 | 18 | 19 | 20 | 
| GTT | GCT | GTC | GCG | ACC | CTG | GTA | CCG | ATG | CTG | 

| Nru 1 1 1 Kon II 

|s|f|a|r|p|d|f|c|l|e| 

| 21 1 22 1 23 1 24 1 25| 26| 27 j 28 1 29[ 30 1 

| TCT | TTT .|- GCT -|-CGT-|-GGG-|-GAT-|-T-T-C-|-TGT-|-CTC_|.GAG_| 
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Table 25, continued. 
| AccIIll | Ava I I 

| Xho I ! 



p|p|y|t|g|p|c|k|a|r 

31| 32 | 33 | 34 | 35 1 36 j 37 j 33 j 39 | 40 1 
CCG | CCA | TAT | ACT | GGG | CCC | TGC | AAA | GCG | CGC 
| PflM I I |BSSH II 

I Apa I | 
Dra II | 



| Pss I I 



i i r 



y|f|y|n|a|k| 

41| 42 1 43 1 44| 45 j 46 1 47 | 48 j 49 1 
ATC | ATC | CGT | TAT | TTC | TAC | AAC | GCT | AAA | 
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Table 25, continued. 
|a|g|l|c|q|t|f|v|y|g|g| 

| 50 | 51| 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 
| GCA | GGC | CTG | TGC | CAG | ACC | TTT | GTA | TAC | GGT | GGT | 
| StU II I ACC I I 

I Xea I I 



|c|r|a|k|r|n|n|f|k| 

| 61| 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 
| TGC | CGT | GCT | AAG | CGT | AAC | AAC | TTT | AAA | 



| s | a | e | d | c | n | r | t | c | g | 

| 70| 71| 72 | 73 | 74 | 75 | 76| 77 | 78| 79 | 
| TCG | GCC | GAA | GAT | TGC | ATG | CGT | ACC | TGC | GGT | 

| Xmalll I | Sph I | 



|g|a|a|e|g|d|d| 

| 80 1 81| 82 1 83 1 84 1 85 j 86| 
| GGC | GCC | GCT | GAA | GGT | GAT | GAT | 
I Bbe I I 
| War I | 



| p | a | k | a | a | 

| 87 | 88 | 89 | 90| 9l| 
| CCG | GCC | AAA | GCG | GCC | 
I Sfi I 1 
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w 



15 



20 



25 



30 



35 



40 



45 



SO 



|f|n|s|l|q|a|s|a|t| 

Table 25, continued. 
| 92 | 93 | 94 | 95| 96| 97 | 98 | 99 | 100 | 
| TTT | AAC | TCT | CTG | CAA | GCT | TCT | GCT | ACC | 388 

| Hind 3 1 



e|y|i|g|y|a|w| 

[ 101 | 102 | 103 | 104 I 105 I 106 j 107 | 

| GAA | TAT | ATC [ GGT j TAC | GCG | TGG | 409 

j Mlu II 



| a | m | v | v | v | 
| 108 | 109 | 110 | 111 | 112 | 

GCC | ATG | GTG | GTG | GTT | 424 

| BstX I L 

| Nco I | 



55 
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Table 25, continued 



|i|v|g|a|t|i|g|i| 

| 113 | 114 | 115 J 116 | 117 | 118 1 119 | 120 | 

| ATC | GTT | GGT | GCT | ACC | ATC | GGT | ATC | 448 



|k|l|f|k[k|f|t|s|k|a 

| 121 1 122 | 123 | 124 | 125 1 126 | 127 | 128 | 129 | 13 0 
| AAA | CTG | TTT | AAG [ AAA | TTT | ACT [ TCG | AAA | GCG | 478 

lAsu III 



j 131 j 132 1 133 J 134 | 

| TCT | TAA | TAG | TGA | GGT | TAC | CAG | TCT | 502 



TAA 1 TGA | GCG [ GGC [ TTT | TT T | T TT [ 522. 



| Trp terminator 



|CCT|GAG.|G -3' 539 
| Sau I [ 



Note the following enzyme equivalences, 

Xma III = £ag I 

Acc III - BspM II 

Dra II = ECQO109 I 

Asu II = BstB I 

Sau~T =~ 'Bsu 3~6~ I ~ 
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Table 27: DNA_synthl 

5' | CCG | TCC I GTC I GGA I CCG I TAT I CCA I GGC I TTT I ACA I CTT I TAT I 

| GCT | TCC I GGC I TCG | TAT I AAT I G TG I TGG I 

| ART | TGT I GAG | CGG I ATA I ACA I ATT I 

olig#4 = 3'- gt taa 

| CCT | AGG I 
gga tec 

/ 3' - olig#3 
| GCC | GCT I CCT | TCG I AAA | GCG | 
egg cga gga age ttt cgc 



| TCT j TAA | TAG | TGA | GGT | TAC | CAG | TCT | 
aga att ate act cca atg gtc aga 

| AAG | CCC | GCC | TAA | TGA| GCG | GGC | TTT | TTT | TTT| 
ttc ggg egg att act cgc ccg aaa aaa aaa 

| CCT | GAG | GCA | GGT | GAG | CG 
gga etc cgt cca etc gc - 5' 
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Table 27, continued 

"Top" strand 99 
"Bottom" strand 100 

Overlap 23 (14 c/g and 9 a/t) 

Net length 158 
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Table 28: DNA — seq2 

5'- | gca j cca | acg| 
| spacer [ 

| CCT | AGG | AGG | CTC | ACT | 
I Avr II | 

I 5. D. I 



jm|k|k|s|l|v|l|k|a|s| 
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 3 | 9 | 10| 
| ATG | AAG | AAA | TCT | CTG | GTT | CTT | AAG | GCT | AGC | 

| Afl II | Nhe I | 

|v|a|v|a|t|l|v|p|m|l| 
| 11 1 12 | 13 | 14 1 15 j 16 1 17 | 18 1 19 1 20 1 
| GTT ( GCT | GTC | GCG | ACC | CTG | GTA | CCG | ATG | CTG | 

| Nru I | | Kpn I | 

|s|f|ajr|p|d|f|c|l|e| 
| 21 1 22 | 23 | 24 | 25 | 26 j 27 ( 28 | 29 | 30 | 
| TCT | TTT | GCT | CGT | CCG j GAT | TTC | TGT | CTC | GAG | 

USSIIXJ 1 Ava I | 

| Xho I 1 

|p|p|y|t|g|p|c|k|a|r| 

| 31 1 32 | 33 j 34| 35 1 36 1 37| 38 1 39| 40 1 
| CCG | CCA | TAT | ACT | GGG | CCC j TGC | AAA | GCG | CGC | 
1 PflM-I .'\ 1 iBssH Ilj 
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Table 28, continued. 

s 



I 


Ana 




1 


Dra 


juu. 


1 


Pss 


i i 



70 

I i I i I r | 

15 | 41| 42 1 43 | 

| ate | ate | cgt | 

20 

| t | s | k | 
| 127 | 128 | 129 | 

2S |ACT|TCG|AAa|gcg|gct|gcg| - 2' 

|Asu Ilj spacer I 

30 



35 



40 



45 



SO 
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Table 30: DNA_seq3 



| 39 | 40 | 

5 ' - | ccc | tgc | aca | GCG | CGC | 
I spacer | BssH III 



|i|i|r|y|f|y|n|a|k| 
| 41 1 42 | 43| 44 j 45 1 46 1 47 | 48 j 49 [ 
| ATC | ATC | CGT | TAT | TTC | TAC | AAC | GCT | AAA | 



|a|g|l|c|q|t|f|v|y|g|gj 
| 50 1 51| 52 | 53 | 54 | 55 1 56 1 57 1 58 1 59 | 60 j 
| GCA | GGC | CTG | TGC | CAG | ACC | TTT | GTA | TAC | GGT | GGT | 
I 5tu I) | Acc I | 

| Xca I | 



|c|r|a|k|r|n|n|f|k| 
| 61 j 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 
| TGC | CGT j GCT | AAG | CGT | AAC | AAC | TTT | AAA | 



|sja|«|d|c|a|r|t|c|g| 

| 70 1 71 1 72 | 73 | 74 | 75 j 76 1 77 1 78 j 79 | 
j TCG | GCC | GAA | GAT | TGC | ATG | CGT | ACC | TGC | GGT | 
IXmallll I Sph l[ 

I g I a | 

L 80.L - 8 -A.!_ _'• 
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Table 3 CV continued 



| GGC | GCC | get | gaa 
| Bbe I | spacer 
| Nar I I 



t | s I k 

127 | 128 | 129 
|ttt|acT|TCG|AAa|gcg|tcg|ccgj - 3 
[Asu II | 
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Table 32: DNA_seq4 





5 


|g|a|a|e|g|d|d| 

5' | 80 | 8 X | 82 | 83 | 84 | 85] 8 6 } 






10 


| cct | cgc | cct | GGC | GCC | GCT | GAA | GGT | GAT | GAT | 
| spacer I Bbe I | 

| Nar I | 






15 


| p | a | k | a | a | 






20 


| 87 | 88 | 89 | 90 | 91 ( 
| CCG | GCC | AAA | GCG | GCC | 

1 Sfi I L 






25 


|f|n|s|l[q|a|s|a|t| 

| 92 | 93| 94| 95| 96 1 97 j 98 | 99 j 100 | 






30 


| TTT | AAC | TCT | CTG | CAA | GCT | TCT | GCT | ACC | 

|Hind 3| 






35 


|e|y|i|g|y|a|w| 

| 101 | 102 | 103 | 104 | 105 | 106 | 107 | 


* 




40 


| GAA | TAT | ATC | GGT | TAC | GCG | TGG | 






I Mlu I| 






45 


| a | m | v | v | v | 
| 108 | 109 | 110 | 111) 112 | 
| GCC | ATG | GTG | GTG | GTT | 






50 


1 BstX I 1 
| Nco Ij 
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Table 32, continued. 
|i|v|g|a|t|i|g|i| 

| 113 | 114 | 115 | 116 1 117 1 118 1 119 1 120 | 
| ATC | GTT | GGT | GCT | ACC | ATC | GGT | ATC | 

|k|l|f|k|k|f|t|s|k| 
| 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 

| AAA | CTG | TTT | AAG | AAA | TTT | ACT | TCG | AAa | gcg j teg | ggc | - 3 

|Asu II | spacer L 



•fable 34: Some interaction sets in BPTI 

Number 
Res. Diff. 



1 

■ 


AAs 


Contents 


BPTI 


1 


2 


1 


4 




-5 


2 


D - 


32 














-4 


2 


E - 


32 














-3 


5 


T P 


F Z -29 














-2 


10 


Z3 


R3 Q2 T2 H G L K E -18 














-1 


10 


D4 


T2 P2 Q2 E G N K R -18 














1 


10 


R21 


A2 K2 H2 P L I T G D 


R 










5 


2 


9 


P20 


R4 A2 H2 N E V F L 


D 








s 


'5 


3 


10 


D15 


K6 T3 R2 P2 S Y G A L 


D 








4 


s 


4 


7 


F19 


D4 L3 Y2 12 A2 S 


F 








S 


5 


5 


1 


C3 3 




C 








X 


X 


6 


10 


Lll 


E5 N4 K3 Q2 12 Y2 D2 T R 


L 








4 




7 


5 


LI 8 


Ell K2 S Q 


E 






s 


4 




8 


7 


P2 6 


H2 A2 I L G F 


? 






3 


4 




9 


9 


P17 


A6 V3 R2 Q L K Y F 


P 




s 


3 


4 




10 


10 


Yll 


E7 D4 A2 N2 R2 V2 S I D 


Y 


s 




s 


4 




11 


10 


T17 


P5 A3 R2 I S Q Y V K 


T 


1 


s 


3 


4 




12 


2 


G32 


* 

K 

■. " 


G 


X 




X 


X 
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Table 34, continued. 





X J 


c 
D 






n 
w 




1 


S 4 


s 




14 


3 


C31 


T A 


c 




1 


s s 


5 




15 


12 




K4 X2 M2 L.2 -2 V G A I N F 


K 




1 s 


3 4 


s 


1fi 


16 


7 


A2 2 


Go Q2 R K D F 


A 




1 s 


s s 


5 




17 


12 


R12 


K5 A2 Y3 H2 S2 F2 L M T G P 


R 




1 2 


3 


s 




18 


6 


121 


M4 F3 L2 V2 T 


I 




1 s 


s 


5 


15 


19 


mm 

1 


HI 


P10 R6 S2 K2 L Q 


I 




1 2 


3 


s 




20 


mm 

5 


R19 


A7 S4 L2 Q 


R 




s s 


S 


5 




21 


4 


Y18 


F13 W I 


V 




2 


s s 


s 




22 


6 


F14 


Y14 H2 ANS 






s 


3 4 




20 


23 


2 


Y32 


F 


Y 






s s 






24 


4 


N2 6 


K3 D3 S 


N 




s 


3 






25 


10 


A12 


S5 Q3 P3 W3 L2 T2 K G R 


A 




5 


s 




25 


26 


9 


K16 


A6 T2 E2 S2 R2 G H V 


K 




S 


3 4 






27 


5 


A18 


S8 K3 L2 T2 


A 




2 


3 4 






28 


7 


G13 


K10 N5 Q2 R H M 


G 




2 


s s 




30 


29 
30 


10 

1 


L9 Q7 K7 A2 F2 R2 M G T N 
C3 3 


L 
C 




2 

X X 


3 

X 






31 


7 


Q12 


Ell L4 K2 V2 Y N 


Q 




2 3 


4 






32 


11 


T12 


P5 K4 Q3 E2 L2 G V S R A 


T 




2 3 


s 




35 


33 


1 


F33 




F 


X 


X X 


X 






34 


11 


Vll 


18 T3 D2 N2 Q2 F H P R K 


V 


1 


2 3 


s 






35 


2 


Y31 


W2 


V 


s 


s s 


5 




40 


36 


3 


G27 


S5 R 


G 


1 










37 


1 


G3 3 




G 


X 




X 






38 


3 


C31 


T A 


C 


1 


■ 


s 5 






39 


7 


R13 


G9 K4 Q3 D2 P M 


R 


1 




4 s 





45 



50 



55 
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s 
s 
s 



Table 34, continued. 

40 2 G22 All As s 5 

41 3 N20 Kll D2 K 4s 

42 9 All R9 S4 G3 H2 D Q K N R s 5 

43 2 N31 G2 N 

44 3 N21 Rll K N 

45 2 F32 Y r 

46 8 K24 E2S2DHVYR K 5 

47 2 T19 S14 S s 5 

48 9 All 19 E4 T2 W2 L2 R K D A 2 s s 

49 7 E19 D6 A2 Q2 K2 T H £ 2 s 

50 6 E16 D12 L2 M Q K D s 5 

51 1 C33 C x x 

52 7 R13 M10 L3 E3 Q2 H V M 2 s 

53 8 R21 Q3 E2 H2 C2 G K D R s 5 

54 7 T23 A3 V2 E2 I Y K T 5 

55 1 C33 C x 

56 8 G15 V8 13 E2 R2 A L S G 

57 8 G19 V4 A3 P2 -2 R L N G 

58 8 All -10 P3 K3 S2 Y2 R F A 

-53 § -24 G2QEAYS PR = 



60 6 -28 Q R I G D 

61 3 -31 TP 

62 2 -32 D 

63 2 -32 K 

64 2 -32 S 

s indicates secondary set 

x indicates in or close to surface but buried and/or highly 
conserved. 



Table 35: 



Distances from C bota to Tip of Side Grou 


p in Angstroms 


Amino Acici type 


Distance 


A 

C (reduced) 
D 


0.0 
1.8 
2.4 
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Table 35: (continued) 



Distances from C b6ta to Tip of Side Group in Angstroms 


Amino Acid type 


Distance 


E 


3.5 


F 


4.3 


G 


- 


H 


4.0 


I 


2.5 


K 


5.1 


L 


2.6 


M 


3.8 


N 


2.4 


P 


2.4 


Q 


3.5 


R 


6.0 


S 


1.5 


- T 


1.5 


V 


1.5 


W 


5.3 


Y 


5.7 


Notes : These distances were calculated for standard model parts with all side groups fully extended. 
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Table 36: Distances, BPTI residue set #2 
Distances in Angstroms between C^ e ^ a s. 
5 Hypothetical C^ eta was added to each Glycine. 



R17 119 Y21 A27 G28 L29 Q31 T3 2 V34 A48 



10 


119 
Y21 
A27 
G28 


7 
15 
22 
26 


.7 
.1 
.6 
. 6 


8, 
17. 
20, 


4 
1 
4 


12. 
13 . 


2 
8 


5. 


3 


























15 


L29 


22 


• 5 


15. 


8 


9. 


6 


5. 


1 


5. 


2 
























Q31 


16 


.1 


10. 


4 


6. 


3 


6. 


8 


10. 


6 


6 


.8 




















T32 


11 


.7 


5. 


2 


6. 


1 


12. 


0 


15. 


5 


10 


.9 


5 


.4 














20 


V34 


5 


• 6 


6/ 


5 


11. 


6 


17. 


6 


21. 


7 


18 


.0 


11 


.4 


8 


.2 












A48 


18 


.5 


11. 


0 


5. 


4 


12. 


6 


13. 


3 


8 


.4 


8 


.8 


8 


. 3 


15 


.7 








E49 


22 


.0 


14. 


7 


8. 


9 


16. 


9 


16. 


1 


12 


.2 


13 


.9 


13 


. 3 


19 


.8 


5 


.5 




W52 


23 


- 6 


1$. 


3 


8. 


6 


12. 


2 


10. 


3 


7 


. 6 


11 


.3 


13 


. 2 


20 


. 0 


6 


. 2 




P9 


14 


.0 


11. 


3 


9. 


0 


12. 


2 


15. 


4 


13 


.3 


7 


.9 


9 


.2 


8 


.7 


13 


.9 




Til 


9 


• 5 


11. 


2 


13. 


5 


18. 


8 


22. 


5 


19 


.8 


13 


.5 


12 


. 1 


5 


.7 


18 


. 5 




K15 


7 


• 9 


14. 


6 


20. 


1 


27. 


4 


31. 


3 


27 


.9 


21 


.4 


18 


.1 


10 


.3 


24 


. 6 


30 


A16 


5 


• 5 


10. 


1 


15. 


9 


25. 


2 


28. 


5 


24 


.6 


18 


.6 


14 


.5 


8 


.6 


19 


.8 




118 


6 


• 1 


6. 


0 


11. 


2 


21. 


3 


24. 


4 


20 


.2 


14 


.7 


10 


.4 


7 


.0 


15 


.0 




R20 


10 


• 6 


5. 


9 


5. 


4 


16. 


0 


18. 


5 


14 


. 6 


9 


.8 


6 


.9 


7 


.8 


10 


.2 


35 


F22 


15 


.6 


10. 


9 


5. 


6 


10. 


5 


12. 


8 


10 


.3 


6 


.2 


8 


.1 


10 


.8 


10 


.3 




N24 


19 


.9 


14. 


7 


9. 


4 


4. 


1 


7. 


3 


6 


. 1 


4 


.8 


10 


.0 


14 


.7 


11 


.4 




K26 


24 


-4 


20. 


1 


15. 


2 


5. 


4 


7. 


7 


9 


.8 


10 


.1 


15 


.3 


19 


.0 


17 


.0 


40 


C3 0 


18 


.9 


12. 


1 


4. 


6 


8. 


8 


9. 


5 


5 


.3 


5 


.9 


8 


.2 


14 


.9 


4 


.9 


F3 3 


10 


.8 


7. 


4 


7. 


7 


12. 


6 


16. 


4 


13 


.0 


6 


.6 


5 


.6 


5 


. 5 


12 


.2 




Y35 


8 


• 4 


7. 


4 


9. 


4 


18. 


4 


21. 


4 


17 


.9 


12 


.2 


9 


.5 


5 


.8 


14 


. 4 




S47 


17 


.6 


10. 


6 


6. 


6 


17. 


3 


17. 


9 


13 


.4 


12 


.6 


10 


. 4 


15 


.9 


5 


.3 


45 


D50 


20 


• 0 


13. 


6 


7. 


2 


17. 


2 


16.8 


13 


.5 


13 


.5 


12 


.9 


17 


.6 


7 


.6 




C51 


18 


.9 


12. 


2 


4 . 


0 


12. 


1 


12 . 


2 


8 


.8 


8 


.8 


9 


.7 


15 


.3 


5 


. 4 




R53 


25 


.4 


18. 


6 


11. 


0 


17. 


2 


15. 


0 


13 


.0 


15 


.7 


16 


.7 


22 


-3 


9 


.7 


50 


R39 


15 


.4 


16. 


9 


17. 


1 


24. 


9 


27. 


2 


24 


.9 


20 


.1 


18 


.7 


13 


.8 


22 


.3 



55 
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Table 36, continued. 
Distances in Angstroms between Ck e t a s. 



Hypothetical Cb eta was added to each Glycine, 







E4 9 


M52 


P9 


Til 


K15 


A16 


118 


R20 


F22 


N24 




MS7 

1 i -J c. 


vJ « JL 




















10 


XT 7^ 


17 7 

X / « / 






















mi -r 

1 XX 


ZZ • X 


£ X . 3 


7 O 
/ • 2 


















tn cr 
l\X3 


O O c: 
2 / . 3 


Zo . / 




7.3 














15 


AX 6 


ft ft ft 


"> >l ft 

24 • 2 


1 A Q 

X4 • y 




0.2 














118 


XV . 4 


X9 . 3 


Xz • 2 


9.3 


X0 • 4 


4.9 












T"» ft 

R20 


1 ft 

13 . 0 


1ft O 

13 . 8 


8 . 0 


9.4 


X4 . 9 


1ft £. 

10 . 6 


6.2 










F22 


13 . 8 


11 . 4 


4 . 1 


10 . 6 


19 . 1 


16 . 3 


12 . 7 


6 . 9 






20 






* 




















N24 


Iff ^ 

15 . 6 


T 1 ft 

11 . 2 


8 . 4 


15 . 3 


14 i 
2 4.1 


ft 1 ft 

21.9 


IS • 2 


1ft "7 

12 . 7 


6 . 6 






K2 6 


20*9 


15.7 


12.1 


18.6 


27.9 


26.6 


23 . 3 


18. 1 


11. 6 


5.9 




C3 0 


8.7 


5.6 


10.6 


16.6 


24.1 


20.2 


15.7 


9.8 


6.8 


6.9 


25 


F33 


16.3 


1 C 4 

Xb . 4 


A ft 

4 . 2 


ft 1 
/ . 1 


1 C ft 

X3 . 0 


12 . o 


7 . O 


6 . 1 


3.6 


ft 

9.3 




Y35 


17 .2 


17 . 8 


7.8 


5.8 


11.0 


7.6 


4.9 


4 . 3 


8.8 


14.8 




C A 1 

o4 / 


A *7 
4 . / 


7 • x 


X3 • J 


XO . 3 


2 J . X 


17 < 
X / . O 


1 *7 Q 

X2 • o 


7.X 


1 -I ft 
X2 • U 


1 cr o 
X3 • J 


30 




3 • 3 


7 7 


1 A 7 
X4 • / 


XO • o 


7 A 7 
Z ** • Z 


1 Q O 
17 • Z 


1 A 7 
X** • / 


Q Q 
7.7 


11 C\ 
XX . u 


i a n 
X4 . / 






7 1 
/ • X 


ft" A 
3.4 


11 n 


1 A 
XO . 4 


7 7 ft" 
Z j • 3 


1 Q 7 

17 . Z 


1 A 

X*t • o 


fl 7 
O • / 


O • 7 


7 . O 




T5cr-5 


D . J 


3 . O 


17 Q 
X / . 7 


7 "7 1 
Z J . X 


7 Q £ 


7 A P. 
ZH . O 


7 n ^ 


is n 

X3 • U 


it a 

X J • o 


X3 • 3 


35 


XV J 7 


z j . 7 


o a n 


X J . u 


7.3 


17 n 
xz . u 


11 

XX • o 


17 ^ 
XZ • 3 


17 

XZ • o 


1 A 7 
X4 . / 


2 U • O 






K26 


C3 0 


F33 


Y35 


S47 


D50 


C51 


R53 








C3 0 


12.4 




















40 








■»n 


















F3 3 


13.9 


10.1 


















• 


Y35 


19.5 


13 .5 


6.4 


















S47 


21.0 


8.8 


13.5 


13.2 














45 


D50 


20.1 


8.6 


14.3 


13.7 


5.0 














C51 


15-0 


3.7 


10.9 


12.5 


6.9 


5.2 












R53 


19.9 


9.9 


18.2 


18.8 


9.4 


5.8 


7.4 








SO 


R3 9 


24.3 


20. 6 


14.4 


9.6 


20.4 


19.0 


18.8 


23.4 
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Table 37: vgDNA to vary BPTI set #2.1 



+ 





g 


P 


C 


k 


a 


X 




35 


36 


37 


38 


39 


40 




GGG 


CCC 


TGC 


AAA 


GCG 




spacer 


Aoa I 







208 



+ 



• 

1 

41 
ATQ 


x 

42 
qfk 


r 
43 
CGT 


y 

44 
TAT 


f 

45 
TTC 


y 

46 
TAC 


n 
47 
AAC 


a 

48 
GCT 


k 
49 
AAA 



235 



/ 3' = olig#27 72 nts 



X 


g 


X 


c 


q 


t 


f 


X 


y 


g 


g 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


qfk 


GGt 


qfk 


TGC 


CAG 


ACC 


TTc 


qfk 


TAC 


GGT 


GGT 



olig#28= 3'- acg gtc tgg aag **m atg cca cca 
78 nts 



Overlap =12 (7 CG, 5 AT) 



c 


r 


a 


k 


r 


n 


n 


f 


k 


61 


62 


63 


64 


65 


66 


67 


68 


69 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 



acg gca cga ttc gca ttg ttg aaa ttt 
1 Esp I I 



+ 



s 




e 


d 


c 


m 




70 


71 


72 


73 


74 


75 






TCT 


qfk 


GAG 


GAT 


TGC 


ATG 


C 


322 



age **m etc eta acg tac gca ccc acc -5' 

| Sph 1 1 spacer [ 

k = equal parts of T and G; m = equal parts of C and A; 
q - (.26 T, .18 C, ,26 A, and .30 G) ; 
f = (.22 T, .16 C, /40 A, and .22 G) ; 
* - complement of symbol above 

Residue 40 42 50 52 57 71 

Possibilities 21 x 21 x 21 x 21 x 21 x 21 = 8.6 x 10 7 
Abundance x 10: 

Of PPBD -768 .271 ,459 .671 .600 .459 

Produce = 1,77 x 10" 8 

Parent = 1/(5.5 x 10 7 ) least favored » 1/(4.2 x 10 9 ) 
Least favored one-amino-acid substitution from PPBD present 
at 1 in 1.6 x 10 7 
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Table 38: Result of varying set#2 of BPTI 2,1 



10 



15 



P 

31 
CCG 



P 
32 

CCA 



y 

33 
TAT 



PflM I 













1 


e 














29 


30 














CTC 


GAG 


178 












Ava I 














Xho I 




t 


g 


P 


c 


k 


a 


D 




34 


35 


36 


37 


38 


39 


40 




ACT 


GGG 


CCC 


TGC 


AAA 


GCG 


GAT 


208 







Dra 




Pss 


X 



20 



25 



t 

1 


Q 


r 


y 


f 


y 


n 


a 


k 






41 


42 


43 


44 


45 


46 


47 


48 


49 






ATC 


CAG 


CGT 


TAT 


TTC 


TAC 


AAC 


GCT 


AAA 






E 


g 


L 


C 


q 


t 


f 


S 


y 


g 


g 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


GAG 


GGC 


CTG 


TGC 


CAG 


ACC 


TTT 


TCG 


TAC 


GGT 


GGT 



235 



268 



30 





C 


r 


a 


k 


r 


n 


n 


f 


k 






61 


62 


63 


64 


65 


66 


67 


63 


69 








TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 




295 


35 






Est) r 


X 
















s 


W 


e 


d 


c 


m 


r 


t 


c 


g 






70 


71 


72 


73 


74 


75 


76 


77 


78 


79 




40 


TCG 


TGG 


GAA 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 


325 



4S 



g 


a 


80 


81 


GGC 


GCC 


?be J 


Na^T J 



50 
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Table 39: vgDNA to vary set#2 BPTI 2.2 

4- 











g 


P 


c 




a 


D 












35 


36 


37 


38 


39 


40 




5' - eg 


aca 


cere 


GGG 


CCC 


TGC 


mrA 


GCG 


GAT 


208 




stDacer 


ADa I 












4- 




+ 


+ 
















X 


Q 


X 


X 


f 


y 


n 


a 


k 






41 


42 


43 


44 


45 


46 


47 


48 


49 






rwA 


CAG 


rvk 


TwT 


TTC 


TAC 


AAC 


GCT 


AAA 




235 



E 


X 


L 


c 


X 


X 


f 


S 


y 


g 


g ! 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


GAG 


qfk 


CTG 


TGC 


qfk 


qfk 


TTT 


TCG 


TAC 


GGT 


GGT 



268 



91 nts olig#30 3'- g cca cca 



Overlap =15 (11 CG, 4 AT) 









- 3 ' 


olig#29 


94 


nts 




c 


r 


a 


k 


r 


n 


n 


f 


k 


61 


62 


63 


64 


65 


66 


67 


68 


69 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 



acg gca cga ttc gca ttg ttg aaa ttt 
I Esp I L 



+ 



s 


W 


X 


d 


C 


m 


70 


71 


72 


73 


74 


75 


TCG 


TGG 


qfk 


GAT 


TGC 


ATG 



age acc **m eta acg tac gcg ace tgc -5' 

| Snh I | spacer I 



k = equal parts of T and G; v = equal parts of C, A, and G; 

m = equal parts of C and A; r = equal parts of A and G; 

w = equal parts of A and T; 

q - (.26 T, .18 C, .26 A, and .30 G) ; 

f = (.22 T, .16 C, .40 A, and .22 G) ; 

* = complement of symbol above 

Residue 38 41 43 44 51 54 55 72 

Possibilities 4x 4x 9x 2x21x21x21x21 

= 6,2 X 10 7 

Abundance x 10 2.5 2-5 .833 5. .663 .397 .437 . 602 
Product = 2.3 X 10~ 8 

Parent = 1/(4.4 x 10 7 ) least favored = 1/(1.25 x 10 9 ) 
Least favored one-amino-acid substitution from ?PBD present 
at 1 in 1.2 x 10 7 
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Table 40: Result of varying 



set#2 



of BPTI 2.2 



1 

29 
CTC 
Xho 



e 

30 
GAG 
I 



178 



10 



15 



p 


P 


y 


t 


g 


P 


C 


E 


a 


D 


31 


32 


33 


34 


35 


36 


37 


38 


39 


40 


CCG 


CCA 


TAT 


ACT 


GGG 


CCC 


TGC 


GAG 


GCG 


GAT 



PflM I 



AP3 I 



208 





V 


Q 


N 


F 


f 


y 


n 


a 


k 




41 


42 


43 


44 


45 


46 


47 


48 


49 


20 


GTT 


CAG 


AA 1 ? 


TTT 


TTC 


TAC 


AAC 


GCT 


AAA 




E 


F 


L 


c 


S 


A 


f 


S 


y 




50 


51 


52 


53 


54 


55 


56 


57 


58 


25 


GAG 


TTT 


CTG 


TGC 


TCT 


GCT 


TTT 


TCG 


TAC 




c 


r 


a 


k 


r 


n 


n 


f 


k 




61 


62 


63 


64 


65 


66 


67 


68 


69 


30 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 








Esd I 


1 











g 

59 
GGT 



235 



g 

60 
GGT 



268 



295 





S 


W 


Q 


d 


c 


m 


r 


t 


c 


g 


35 


70 


71 


72 


73 


74 


75 


76 


77 


78 


79 




TCG 


TGG 


CAG 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 



325 



40 



45 



g 


a 


80 


81 


GGC 


GCC 


Bbe T 


Nar I 



so 



55 
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Table 41: vg DNA 



set#2 



of BPTI 2.3 



















1 


e 


















29 


30 












- ca 


acre 
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k = equal parts of T and G; m = equal parts of C and A; 
w = equal parts of A and T; n = equal parts of A,C,G,T; 
d = equal parts A , G , T ; v = equal parts A,C,G; 

q » (.26 T, .18 C, -.26 A, and .30 G) ; 
f = (.22 T, .16 C, -40 A, and .22 G) ; 
* = complement of symbol above 

Residue 32 34 40 44 50 52 55 57 

Possibilities 6x 6x21x 6x 3x 5x21x21 = 

3 x 10 7 

Abundance x 10 

of PPBD 10/6 10/6 .545 10/6 10/3 30/3 .459 .701 

product = 1.01 x 10~ 7 

parent = 1/(1 x 10 7 ) least favored = 1/(4 x 10 8 ) 

Least favored one-amino-acid substitution from PPBD present 

■atr-l^inr-j-x-ro 7 ^ ' 
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Table 42: Result of varying set#2 of BPTI 2.3 
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Claims 



1. A method of obtaining a nucleic acid encoding a proteinaceous binding domain that binds a predetermined target 
material, other than the antigen combining site of an antibody which specifically binds said domain, comprising: 

• ... 

(a) preparing a variegated population of amplifiable genetic packages, said genetic packages being selected 
from the group consisting of cells, spores and viruses, each said genetic package being genetically alterable 
and having an outer surface including a genetically determined outer surface protein, each package including 
a first nucleic acid construct coding for a chimeric potential binding protein, each said chimeric protein com- 
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prising, and each said construct comprising DNA encoding, (i) a potential binding domain which is a mutant 
of a stable predetermined domain of a predetermined parental protein, other than a single chain antibody, 
comprising one or more identifiable surface residues, and for which both an affinity molecule and an amino 
acid sequence are either available or obtainable, and (ii) an outer surface transport signal for obtaining the 
display of the potential binding domain on the outer surface of the genetic package, the expression of which 
construct results in the display of said chimeric potential binding protein and its potential binding domain on 
the outer surface of said genetic package; and wherein said variegated population of genetic packages col- 
lectively display a plurality of different potential binding domains, the differentiation among said plurality of 
different potential binding domains occurring through the at least partially random variation of one or more 
predetermined amino acid positions of said parental binding domain to randomly obtain at each said position 
an amino acid belonging to a predetermined set of two or more amino acids, the amino acids of said set 
occurring at said position in statistically predetermined expected proportions, the genetic message encapsu- 
lated by said genetic packages being ampltfiable in vitro or by cell culture of said genetic packages and sep- 
arable on the basis of the potential binding domain displayed thereon, 

(b) causing the expression of said chimeric potential binding proteins and the display of said potential binding 
domains on the outer surface of said packages; 

(c) contacting said packages with the predetermined target material such that said potential binding domains 
and the target material may interact; 

(d) separating packages displaying a potential binding domain that binds the target material from packages 
that do not so bind, on the basis of their ability to bind with the target material in step (c), and 

(e) recovering at least one package displaying on its outer surface a chimeric binding protein comprising a 
stable successful binding domain (SBD) which bound said target, said package comprising nucleic acid en- 
coding said successful binding domain, and amplifying said SBD-encoding nucleic acid in vivo or in vitro. 

The method of claim 1 wherein said population of amplifiable genetic packages is characterized by the display of 
at least 10 s but not more than 10 9 different potential binding domains and/or (2) from 1 in 10 4 to 1 in 10 9 of the 
packages of said population display the same potential binding domain. 

The method of claim 1 wherein the level of variegation of the population is chosen such that the packages displaying 
potential binding domains obtained by single amino acid substitutions in the amino acid sequence of the parental 
potential binding domain are present in detectable amounts. 

The method of claim 1 wherein said signal is provided by a segment of said chimeric protein which is essentially 
identical in amino acid sequence with at least a functional portion of a natural outer surface protein encoded by 
said genetic package or a cell naturally infected by said genetic package. 

The method of claim 1 wherein the parental potential binding domain is initially chosen to be one which is over 
50% homologous with a domain of a known protein, the latter domain having a melting point of at least about 60°C. 

The method of claim 5 wherein the initially chosen parental binding protein does not preferentially bind the prede- 
termined target. 

The method of claim 1 , said target material comprising one or more discrete molecules, said parental potential 
binding domain being characterized as a sequence of amino acids, further comprising identifying an interaction 
set of amino acids which are on the surface of the parental potential binding domain and which can all simultane- 
ously touch a single molecule of the target material, and obtaining potential binding domains by substituting a 
different amino acid for one or more of the amino acids in said interaction set. 

The method of claim 1 wherein the target material is a non-macromolecular organic compound and the potential 
binding-domain3-comprise-greaterthan-abcut-80ajTtino-ac!d-residues; 

The method of claim 1 wherein the target material is a macromolecular organic compound and the potential binding 
domains have less than about 80 amino acid residues. 
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10. The method of claim 1 wherein the target material is a mineral insoluble in aqueous solution. 

11. The method of claim 1 wherein the target inorganic molecule or complex ion that is stable in aqueous solution. 

12. The method of claim 1 wherein the target is an organometallic compound that is stable in aqueous solution. 

13. The method of claim 1 wherein the target material is a general protease, wherein the immobilized target material 
is first incubated with an irreversible or covalent inhibitor to inactivate the protease. 

14. The method of claim 1 wherein the amplifiable genetic package is a cell or virus that can be affinity separated and 
retain viability. 

15. The method of claim 5 wherein the known binding protein is an enzyme, the activity of which has a lethal effect 
on the amplifiable genetic package, the host of the amplifiable genetic package, or the target, wherein the majority 
of the nucleic acid constructs code on expression for an analogue of the known binding protein that does not have 
such lethal enzymatic activity. 

16. The method of claim 1 wherein the target contains ionizable groups and the pH of the solutions of the intended 
use and the pH of the affinity separations are chosen so that both the potential binding protein and the target 
remain stable. 

1 7. The method of claim 1 wherein the target contains ionizable groups, further comprising providing counter ions to 
reduce electrostatic repulsion between the potential binding protein and the target. 

18. The method of claim 1 wherein the initial potential binding domain is picked so that, under the conditions of intended 
use of the desired binding protein and under the conditions of affinity separation, that the potential binding domains 
and the target will either have opposite charge or one of them will be neutral. 

19. The method of claim 1 wherein the amplifiable genetic package is a bacterial cell. 

20. The method of claim 1 wherein the amplifiable genetic package is a bacterial spore. 

21. The method of claim 1 wherein the amplifiable genetic package is a bacteriophage. 

22. The method of claim 21 wherein the signal is provided by the coat protein of M1 3 or a segment thereof embodying 
an outer surface transport signal. 

23. The method of claim 21 wherein the signal is provided by the gene III protein of M1 3 or a segment thereof embodying 
an outer surface transport signal. 

24. The method of claim 1 wherein the distribution of nucleotides incorporated at each variegated codon is chosen to 
yield substantially equal abundances of acidic and basic amino acids. 

25. The method of claim 1, wherein step (c) further comprises contacting the packages with a second material and 
isolating packages which do not bind that second material. 

26. The method of claim 1 , wherein after obtaining a novel binding protein recognizing a first predetermined target, 
the novel binding protein is chosen as a parental potential binding protein for the isolation of a derivative protein 
which also binds to a second predetermined target. 

27. The method of claim 1 wherein the initially chosen parental potential binding domain is selected from the group 
consisting of (a) binding domains of bovine pancreatic trypsin inhibitor, crambin, ovomucoid, T4 lysozyme, hen 
egg white lysozyme, ribonuclease, and azurin, and (b) domains at least 50% homologous with any of the foregoing 

domains-and-which have-a-melting-poin^of-at-least-60-C. 

28. The method of claim 1 9 wherein the outer surface transport signal is provided by the lamB protein or a segment 
thereof embodying an outer surface transport signal. 
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29. A chimeric protein comprising (1) at least a segment of an outer surface protein of a filamentous phage, said 
segment providing an outer surface transport signal recognized by a cell infected by said phage such that the 
chimeric protein is assembled into the coat of phage particles produced by said cell, and (ii) a stable, proteinaceous 
binding domain, other than a single chain antibody, said domain comprising one or more identifiable surface res- 
idues, that binds a predetermined target material, other than the antigen combining site of an antibody which 
specifically binds said domain, the target being bound sufficiently strongly so that the dissociation constant of the 
binding domain: target complex is less than 10* 6 moles/liter, and that is heterologous to said phage. 

30. A virus bearing on its outer surface a chimeric binding protein, said protein comprising (i) a proteinaceous binding 
domain, other than a single chain antibody, which is sufficiently stable in structure to have a melting point of at 
least 40°C, and which binds to a target, other than the variable domain of an antibody, sufficiently strongly so that 
the dissociation constant of the binding domain: target complex is less than 1Q- 6 moles/liter, and (ii) at least a 
functional portion of a coat protein of said virus, said portion acting, when the chimeric protein is produced in a 
suitable host cell, to cause the display of the chimeric binding protein or a processed form thereof on the outer 
surface of the virus, said binding domain being capable of binding to a target material to which said coat protein 
does not preferentially bind, said binding domain being foreign to the native coat proteins of said virus. 

31. The method of claim 1 wherein in at least one instance the amino acid residues varied in a first assortment of 
potential binding domains are left constant in the next assortment of potential binding domains. 

32. The method of claim 1 wherein the method of preparing a population of variegated genetic packages comprises 
the preparation of a population of variegated DNA encoding a potential binding domain which is a mutant of a 
stable predetermined domain of a predetermined parental protein, other than a single chain antibody comprising 
one or more identifiable surface residues, and for which both an affinity molecule and an amino acid sequence 
are either available or obtainable, wherein the distribution of nucleotides incorporated at each variegated codon 
is chosen to yield substantially equal abundances of acidic and basic amino acids. 

33. The protein of claim 29, wherein the protein comprises a first foreign domain recognizing a first target material and 
a second foreign domain recognizing a second target material. 

34. The method of claim 1 wherein the initially chosen parental potential binding domain is at least 50% homologous 
with the binding domain of bovine pancreatic trypsin inhibitor. 



35. The method of claim 3 wherein the initially chosen parental potential binding protein has at least one stable binding 
domain and said domain has a melting point of at least 60°C and is stable over a pH range of at least 3.0-8.0. 

36. The method of claim 1 9 wherein the amplifiable genetic package is a strain of Escherichia coli. 

37. The method of claim 21 wherein the amplifiable genetic package is a filamentous phage. 

38. The method of claim 21 wherein the amplifiable genetic package is a derivative of an Mt 3 Escherichia coli bacte- 
riophage or a derivative of the Pseudomonas aeruginosa filamentous phage Pf 1 . 

39. The method of claim 24 wherein the distribution of nucleotides incorporated at each variegated codon is further 
chosen to yield the largest value for the quantity ((1.-abundance(stop codons)) times (abundance of the least 
abundant amino acid)/(abundance of the most abundant amino acid)). 

40. The chimeric protein of claim 29 wherein said foreign domain binds to a target material not preferentially bound 
by said outer surface protein. 

41. The method of claim 32 wherein the distribution of nucleotides incorporated at each variegated codon is further 
chosen to yield the largest value for the quantity ((1 .-abundance(stop codons)) times (abundance of the least 
abundant amino acid)/(abundance of the most abundant amino acid)). 



42. The method of claim 1 wherein the predetermined parental protein is not natively associated with the genetic 
package. 

43. The method of claim 1 wherein the predetermined parental protein is not a surface protein of the genetic package. 
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44. The method pf claim 1 wherein the predetermined parental protein is not a surface protein of any cell or virus. 

45. The method of claim 1 wherein the predetermined parental protein is not a bacterial or viral protein. 

46. The method of claim 1 wherein, for at least one codon, a desired mix of amino acids is obtained by use of a non- 
equimolar mixture of nucleotides in synthesizing at least one base position of that codon. 

47. The method of claim 1 wherein the affinity of the successful binding domain for the target is substantially greater 
than the affinity of the parental binding domain for the target. 

48. The method of claim 1 wherein the outer surface of the genetic package presents, not only said chimeric protein, 
but also the cognate wild type outer surface protein. 

49. The method of claim 48 wherein at least one of the genes encoding said chimeric protein and said wild type outer 
surface protein is under the control of a regulatable promotor, allowing the ratio between chimeric and wild type 
protein to be controlled. h 

50. The method of claim 1 wherein the outer surface protein is a coat protein derived from gene III of a filamentous 
phage. , .- . 

51. The method of claim 48 wherein the potential binding domain is linked to an exposed amino or carboxy terminus 
of the mature wild type coat protein. 

52. The method of claim 1 wherein the insertion site for the initial potential binding domain is at a domain boundary 
25 of a coat protein. 

53. The method of claim 1 wherein the insertion site for the initial potential binding domain is at a turn or loop of a coat 
protein. 

30 54. The method of claim 1 wherein the outer surface protein is the gene VIII protein of M1 3. 

55. The method of claim 1 wherein a package is recovered by elution as a result of a decrease in pH, an increase in 
the concentration of a salt or other solute that weakens non-covalent interactions, temperature, or the concentration 
of soluble target material, or a combination thereof, 

35 

56. The method of claim 1 wherein the target material is immobilized on a matrix and the genetic package is amplified 
in situ on the matrix. 

57. The method of claim 1 wherein the target material is immobilized on a matrix and a package is recovered by elution 
40 after chemically or enzymatically degrading the linkage holding the target to the matrix. 

58. The virus of claim 30, said virus further bearing on its outer surface the corresponding wild-type coat protein of 
said virus. 

45 59. The virus of claim 30 wherein the proteinaceous binding domain is coupled essentially to the amino terminal of 
the mature coat protein. 

60. The protein of claim 29 wherein the proteinaceous binding domain is coupled essentially to the amino terminal of 
the mature coat protein. 



so 



61. The protein of claim 29 wherein the outer surface protein is the gVIII (major coat) protein. 

62. The virus of claim 30 wherein the outer surface protein is th gVIII (major coat) protein. 



& 63. The protein of claim 29 wherein the outer surface protein is the gill protein. 
64. The virus of claim 30 wherein the outer surface protein is the gill protein. 
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Patentanspruche 

1 . Verfahren zum Erhalten einer Nukleinsaure, die fur eine proteinartige Bindedomane kodiert, die ein vorbestimmtes 
Zielmaterial bindet, das von der Antigenbindungsstelle eines Antikdrpers, der diese Domane bindet, verschieden 
ist, umfassend 

a) das Herstellen einer vielfaltigen Population amplifizierbarer genetischer Packungen, wobei die genetischen 
Packungen aus der Gruppe ausgewahlt sind, die aus Zellen, Sporen und Wen besteht, wobei jede genetische 
Packung genetisch veranderbar ist und eineauBere Oberflache einschlieBlich eines genetisch determinierten 
auBeren Oberflachenproteines hat, wobei jede Packung ein erstes Nukleinsaurekonstrukt enthait, das fur ein 
mogliches chimares Bindeprotein kodiert, wobei jedes chimare Protein (i) und (ii), wie nachstehend definiert, 
umfaBt, und jedes dieser Konstrukte ON A umfaBt, die kodiert fur (i) eine potentielle Bindedomane, die eie 
Mutante einer stabilen vorherbestimmten Domane eines vorherbestimmten Efterproteines ist, das von einem 
Einzelkettenantikorper verschieden ist und ein oder mehrere identifizierbare Oberflachenreste umfaBt, und 
fur die sowohl ein Affinitatsmolekul als auch eine Aminosauresequenz entweder verfugbaroder erhaltlich sind, 
und (ii) ein auBeres Oberflachentransportsignal, urn eine Darstellung der moglichen Bindedomane auf der 
auBeren Oberflache der genetischen Packung zu erhalten, wobei die Expression des Konstruktes zum Dar- 
steilen des chimaren moglichen Bindeproteines und seiner moglichen Bindedomane auf der auBeren Ober- 
flache der genetischen Packung fuhrt; und worin die vielfaltige Population genetischer Packungen zusammen 
eine Vielzahl von unterschiedlichen potentiellen Bindedomanen aufweist, wobei die Differenzierung unter der 
Vieizahl derverschiedenen potentiellen Bindedomanen durch diemindestens teilweise zufallige Variation einer 
oder mehrerer vorherbestimmter Am inosau reposition en der Elterbindedomane auftritt, um statistisch an jeder 
Position eine Aminosaure zu erhalten, die zu einem vorherbestimmten Satz von zwei oder mehr Aminosauren 
gehort, wobei die Aminosauren des Satzes an der Position in statistisch vorherbestimmten erwarteten Pro- 
portionen auftreten, wobei die genetische Botschaft, die in den genetischen Packungen eingekapselt ist, jn 
vitro oder durch Zellkultur der genetischen Packungen amplifizierbar und auf der Basis der moglichen Binde- 
domane, die sich darauf zeigt, abtrennbar ist; 

(b) das Verursachen der Expression des chimaren potentiellen Bindeproteins und des Darstellens der poten- 
tiellen Bindedomanen auf der auBeren Oberflache der Packungen; 

(c) das Inkontaktbringen der Packungen mit dem vorherbestimmten Zielmaterial, so da3 die potentiellen Bin- 
d e doman e n un d da s-Ztefmatert a l miteinan d er wechse l w ir ken ko n nen ; 

(d) das Abtrennen von Packungen, die eine potentielle Bindedomane aufweisen, die das Zielmaterial bindet, 
von Packungen, die nicht so binden, auf der Grundlage ihrer Fahigkeit, an das Zielmaterial in Schritt (c) zu 
binden, und 

(e) das Gewinhen mindestens einer Packung, die auf ihrer auBeren Oberflache ein chimares Bindeprotein 
aufweist, das eine stabile erfolgceiche Bindedomane (S8D) umfaBt, die an das Ziel bindet, wobei die Packung 
Nukleinsaure umfaBt, die fur die erfolgreiche Bindedomane kodiert, und Amplifizieren der fur SBD kodierenden 
Nukleinsaure in vivo oder in vitro. 

2. Verfahren nach Anspruch 1, wobei die Population amplifizierbarer genetischer Packungen durch das Darstellen 
von mindestens 10 s , aber nicht mehr als 10 9 verschiedenen potentiellen Bindedomanen und/oder (2) dadurch, 
daB von 1 in 10 4 bis 1 in 10 9 der Packungen der Population die gleiche potentielle Bindedomane aufweisen, 
gekennzeichnet ist. 

3. Verfahren nach Anspruch 1, wobei das Niveau der Vielfaltigkeit der Population so ausgewahlt ist, daB die die 
potentiellen Bindedomanen aufweisenden Packungen, die durch Substitution einzelner Aminosauren in der Ami- 
nosauresequenz der moglichen Elterbindestelle erhalten sind, in nachweisbaren Mengen vorhanden sind. 

4. Verfahren nach Anspruch 1, wobei das Signal durch ein Segment des chimaren Proteins zur Verfugung gestellt 
wird,-das-hinsichtlich-seiner- 

Oberflachenproteines identisch ist, das von der genetischen Packung kodiert wird, oder von einer Zelle, die von 
der genetischen Packung naturlicherweise infiziert wird. 

5. Verfahren nach Anspruch 1, wobei die potentielle Elterbindedomane ursprunglich so ausgewahlt ist, daB sie eine 
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ist, die einer Domane eines bekannten Proteines uber 50% homolog ist, wobei die letztgenannte Domane einen 
Schmelzpunkt von mindestens ungefahr60°C hat. 

6. Verfahren nach Anspruch 5, wobei das ursprunglich gewahlte Elterbindeprotein nicht bevorzugt an das vorherbe- 
stimmte Ziel bindet. 

7. Verfahren nach Anspruch 1 , wobei das Zielmaterial ein oder mehrere diskrete Molekule umfaGt und die potentielle 
Elterbindedomane als eine Aminosauresequenz gekennzeichnet ist, umfassend weiter das Identifizieren eines 
Wechselwirkungssatzes von Aminosauren, die an der Oberflache der potentiellen Elterbindedomane sind und die 
alle gleichzeitig ein einzelnes Molekul der Zielsequenz beruhren konnen, und Erhalten der potentiellen Bindedo- 
manen durch Ersetzen einer oder mehrerer der Aminosauren in dem Wechselwirkungssatzdurch unterschiedliche 
Aminosauren. 

8. Verfahren nach Anspruch 1 , wobei das Zielmaterial eine nicht makromolekulare organ ische Verbindung ist und 
die potentiellen Bindedomanen mehrals ungefahr80 Aminosaurereste umfassen. 

9. Verfahren nach Anspruch 1, wobei das Zielmaterial eine makromolekulare organische Verbindung ist und die po- 
tentiellen Bindedomanen weniger als ungefahr 80 Aminosaurereste haben. 

10. Verfahren nach Ansprueh 1, wobei das Zielmaterial ein in waBriger Losung unldsliches Mineral ist. 

11. Verfahren nach Anspruch 1, wobei das Ziel ein anorganisches Molekul oder komplexes Ion ist, das in waBriger 
Losung stabi! ist. 

1 2. Verfahren nach Anspruch 1 , wobei das Ziel eine organometallische Verbindung ist, die in waBriger Losung stabi I ist. 

13. Verfahren nach Anspruch 1, wobei das Zielmaterial eine allgemeine Protease ist, wobei das immobilisierte Ziel- 
material zuerst mit einem irreversiblen oder kovalenten Inhibitor inkubiert wird, urn die Protease zu inaktivieren. 

14. Verfahren nach Anspruch 1, wobei die amplifizierbare genetische Packung eine Zelle oder ein Virus ist, die/das 
affinitatsgetrennt werden kann und lebensfahig bleibt. 

"1~5. Verfahren nach Anspruch 5, wobei das bekannte Bindeprotein ein Enzym ist, dessen Aktivit&t eine letale Wirkung 
auf die amplifizierbare genetische Packung, den Wirt der amplifizierbaren genetischen Packung oder das Ziel hat, 
wobei die Mehrzahl der Nukieinsaurekonstrukte fur die Expression eines Analogs des bekannten Bindeproteines 
kodiert, das nicht eine solche letale enzymatische Aktivitat hat. 

16. Verfahren nach Anspruch 1, wobei das Ziel ionisierbare Gruppen enthait und der pH der Losungen fur den beab- 
sichtigten Gebrauch und der pH der Affinitatstrennung so gewahlt sind, da3 sowohl das potentielle Bindeprotein 
als auch das Ziel stabil bleiben. 

17. Verfahren nach Anspruch 1, wobei das Ziel ionisierbare Gruppen enthait, umfassend weiter das Bereitstellen von 
Gegenionen zur Verringerung der eiektrostatischen AbstoGung zwischen dem potentiellen Bindeprotein und dem 
Ziel. 

18. Verfahren nach Anspruch 1, wobei die ursprungliche potentielle Bindedomane so ausgewahlt ist, daB unter den 
Bedingungen der beabsichtigten Verwendung des gewunschten Bindeproteines und unter den Bedingungen der 
Affinitatstrennung die potentielle Bindedomane und das Ziel entweder eine entgegengesetzte Ladung haben oder 
eines davon neutral sein wird. 

19. Verfahren nach Anspruch 1, wobei die amplifizierbare genetische Packung eine bakterielle Zelle ist. 

20. Verfahren nach Anspruch 1, wobei die amplifizierbare genetische Packung eine bakterielle Spore ist. 



21. Verfahren nach Anspruch 1, wobei die amplifizierbare genetische Packung ein Bakteriophage ist. 

22. Verfahren nach Anspruch 21 , wobei das Signal vom Hullprotein von M1 3 oder einem Segment davon bereitgestellt 
wird, das ein auBeres Oberflachentransportsignal darstellt. 



119 



EP 0 436 597 B1 



23. Verfahren nach Anspruch 21, wobei das Signal durch das Gen Ill-Protein von M13 oder ein Segment davon be- 
reitgestellt wird r das ein auGeres Oberflachentransportsignal darstellt. 

24. Verfahren nach Anspruch 1 , wobei die Verteilung der Nukleotide, die in jedes geanderte Kodon eingebaut sind, 
s so ausgewahlt ist, daG sie zu im wesentlichen gleichen Haufigkeiten von sauren und basischen Aminosauren fuhrt. 

25. Verfahren nach Anspruch 1, wobei Schritt (c) weiterdas Inkontaktbringen der Packungen mit einem zwerten Ma- 
terial und das Isolieren der Packungen, die nicht an das zweite Material binden, umfaGt. 

10 26. Verfahren nach Anspruch 1 , wobei nach dem Erhalten eines neuen Bindeproteines, das ein erstes vorherbestimm- 
tes Ziel erkennt, das neue Bindeprotein als ein potentielles Elterbindeprotein ausgewahlt wird fur die Isolierung 
eines Proteinderivates, das auch an ein zweites vorherbestimmtes Ziel bindet. 

27. Verfahren nach Anspruch 1, wobei die ursprunglich ausgewahlte potentielle Elterbindedomane aus der Gruppe 
is ausgewahlt ist, die aus (a) Bindedomanen von Pan kreastryp sin inhibitor aus Rind, Crambin, Ovomucoid, T4 Lyso- 

zym, Lysozym aus HuhnereiweiG, RibonukJease und Azurin und (b) Domanen, die mindestenes 50% mit einer der 
zuvor genannten Domanen homolog sind und einen Schmeizpunkt von mindestens 60°C haben, besteht. 

28. Verfahren nach Anspruch 19, wobei das auGere Oberflachentransportsignal vom lamB Protein oder einem Seg- 
20 ment davon bereitgestettt wird, das ein auGeres Oberflachentransportsignal darstellt. 

29. Chimares Protein, umfassend (1 ) mindestens ein Segment eines auGeren Oberflachenproteines eines filamento- 
sen Phagen, wobei das Segment ein auGeres Oberflachentransportsignal bereitstellt, das von einer Zelle erkannt 
wird, die von dem Phagen infiziert ist, so dal3 das chimare Protein in der Hulle von Phagenpartikeln, die von der 

25 Zelle erzeugt werden, zusammengesetzt wird, und (2) eine stabile proteinartige Bindedomane, die von einem 

Einzelkettenantikorper verschieden ist, wobei die Domane ein oder mehrere identifizierbare Oberflachenreste urn- 
faGt, die an ein vorherbestimmtes Zieimaterial binden, das von der Antigenbindungsstelle eines Antkorpers, der 
die Domane spezifisch bindet, verschieden ist, wobei das Ziel ausreichend stark gebunden wird, so daG die Dis- 
soziationskonstante des Bindedomanen:Zielkomplexes wenigerals 10* 6 Mol/I betragt, und wobei die Bindedomane 

30 zum Phagen heterolog ist. 

30. Virus, das auf seiner auGeren Oberflache ein chimares Bindeprotein tragt, wobei das Protein (i) eine proteinartige 
Bindedomane umfaGt, die von einem Einzelkettenantikorper verscnieden ist und die hinsichtlich ihrer Struktur 
ausreichend stabil ist, um einen Schmeizpunkt von mindestens 40°C zu haben, und die ausreichend stark an ein 

35 ziel, das von der variablen Domane eines Antkorpers verschieden ist, bindet, so daG die Dissoziationskonstante 

des Bindungsdomanen:Zielkomplexes weniger als TO" 6 Mol/l betragt, und (ii) mindestens einen funktionellen Teil 
eines Hullproteines des Virus umfaGt, wobei der Teil, wenn das chimare Protein in einer geeigneten Wirtszelle 
erzeugt wird, so wirkt, daG das Darstellen des chimaren Bindeproteines oder seiner prozessierten Form an der 
auGeren Oberflache des Virus verursacht wird, wobei die Bindedomane an das Zieimaterial binden kann, an das 

^0 das HOHprotern nicht bevorzugt bindet, wobei die Bindedomane den nativen Hullproteinen des Virus fremd ist. 

31. Verfahren nach Anspruch 1 , wobei in mindestens einem Fall die Aminosaurereste, die in einer ersten Anordnung 
von potentiellen Bindedomanen variiert sind, in der folgenden Anordnung potentieller Bindedomanen konstant 
gelassen werden. 

45 

32. Verfahren nach Anspruch 1, wobei das Verfahren zum Herstellen einer Population vielfaltiger genetischer Pak- 
kungen die Herstellung einer Population vielfaltiger DNA umfaGt, die eine potentielle Bindedomane kodiert, die 
eine Mutante einer stabilen vorherbestimmten Domane eines vorherbestimmten Elterproteines, das von einem 
Einzelkettenantikorper verschieden ist, ist, umfassend ein oder mehrere identifizierbare Oberflachenreste, und fur 

so die sowohl ein Affinitatsmolekul als auch eine Aminosauresequenz entweder verfugbar oder erhaltlich sind, worin 

die Verteilung von Nukleotiden, die in jedem variierten Kodon eingebaut sind, so ausgewahlt ist, daG sie zu im 
wesentlichen gleichen Haufigkeiten von sauren und basischen Aminosauren fuhrt. 

33_p rote j n _ nach _ An$pm ^_2g- wo ^ 

55 kennt, und eine zweite fremde Domane, die ein zweites Zieimaterial erkennt 

34. Verfahren nach Anspruch 1 , wobei die anfanglich gewahlte parentale potentielle 8indedomane mindestens 50% 
homolog mit der Bindedomane von pankreatischem Trypsininhibitor aus Rindern ist. 
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35. Verfahren nach Anspruch 3, wobei das anfanglich gewahlte parentale potentielle 8indeprotein mindestens eine 
stabile Bindedomane hat und diese Domane einen Schmelzpunkt von mindestens 60°C hat und uber einen pH- 
Bereich von mindestens 3,0 bis 8,0 stabil ist. 

36. Verfahren nach Anspruch 19, wobei die amplifizierbare genetische Packung ein Escherichia coli -Stamm ist. 

37. Verfahren nach Anspruch 21, wobei die amplifizierbare genetische Packung ein filamentoser Phage ist. 

38. Verfahren nach Anspruch 21, wobei die amplifizierbare genetische Packung ein Oerivat eines Escherichia coli 
M1 3-Bakteriophagen oder eines Oerivates des filamentdsen Phagen Pfl aus Pseudomonas aeruginosa ist. 

39. Verfahren nach Anspruch 24, wobei die Verteilung von Nukleotiden, die in jedes variierte Kodon eingebaut sind, 
weiter so gewahlt wird, daB der groBte Wert fur die Menge ((1.- Haufigkeit (Stopp-Kodons)) x (Haufigkeit der am 
wenigstens haufigen Aminosaure)/(Haufigkeit der haufigsten Aminosaure)) ist. 

40. Chimares Protein nach Anspruch 29, wobei die Fremddomane an ein Zielmateria! bindet, das nicht bevorzugt von 
dem auBeren Oberflachenprotein gebunden wird. 

41. Verfahren nach Anspruch 32, wobei die Verteilung von Nukleotiden, die in jedes variierte Kodon eingebaut sind, 
weiter so gewahlt wird, claB sie den hdchsten Wert fur die Menge ((1. - Haufigkeit (Stopp-Kodons)) x (Haufigkeit 
der am wenigstens haufigen Aminosaure)/(Haufigkeit der haufigsten Aminosaure)) ergibt. 

42. Verfahren nach Anspruch 1, wobei das vorherbestimmte parentale Protein mit der genetischen Packung nicht 
nativ assoziiert ist. 

43. Verfahren nach Anspruch 1, wobei das vorherbestimmte parentale Protein kein Oberflachenprotein der geneti- 
schen Packung ist. 

44. Verfahren nach Anspruch 1 , wobei das vorherbestimmte parentale Protein kein Oberflachenprotein einer Zelle 
oder eines Virus ist. 

45. Verfahren nach Anspruch 1 , wobei das vorherbestimmte parentale Protein kein bakterielles oder virales Protein ist. 



46. Verfahren nach Anspruch 1, wobei fur mindestens ein Kodon eine gewunschte Mischung von Aminosauren durch 
die Verwendung einer nicht aquimolaren Mischung von Nukleotiden bei der Synthese mindestens einer Basenpo- 
sition des Kodons erhalten wird. 

47. Verfahren nach Anspruch 1 , wobei die Affinitat der erfolgreichen Bindedomane fur das Ziel wesentlich graBer ist 
ais die Affinitat der parentalen Bindedomane fur das Ziel. 

48. Verfahren nach Anspruch 1, wobei die auBere Oberflache der genetischen Packungen nicht nur das chimare 
Protein prasentiert, sondern auch das entsprechende auBere Oberflachenprotein vom Wildtyp. 

49. Verfahren nach Anspruch 48, wobei mindestens eines der fur das chimare Protein und fur das auBere Oberfla- 
chenprotein aus Wildtyp kodierenden Gene unter der Kontrolle eines regulierbaren Promotors steht, der die Steue- 
rung des Verhaltnisses von chimarem zu Wilddtyp-Protein zulaBt. 

50. Verfahren nach Anspruch 1, wobei das auBere Oberflachenprotein ein Hullprotein ist, das vom Gen III eines fila- 
mentosen Phagen abgeleitet ist. 

51. Verfahren nach Anspruch 48, wobei die potentielle Bindedomane mit einem exponierten Amino- oder Carboxyl- 
terminus des reifen Wildtyp-Hullproteines verbunden ist. 

52— Verfahren-nach-Anspruch-1-wobei-die-lns^ 
manengrenze eines HOIIproteines ist. 

53. Verfahren nach Anspruch 1 , wobei die Insertionsstelle fur die anfangliche potentielle Bindedomane an einer Kehre 
Oder einer Schlaufe eines HOIIproteines ist. 
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54. Verfahren nach Anspruch 1, wobei das auBere Oberflachenprotein das Gen VIII Protein von M13 ist. 

55. Verfahren nach Anspruch 1, wobei eine Packung durch Elution als ein Ergebnis einer Erniedrigung des pH's zu- 
ruckgewonnen wird, einer Steigerung der Konzentration des Salzes oder eines anderen gelosten Bestandteils, 

5 der nichtkovalente Wechselwirkungen schwacht, der Temperatur oder der Konzentration eines loslichen Zielma- 

terials oder einer Kombination davon. 

56. Verfahren nach Anspruch 1, wobei das Zielmaterial auf einer Matrix immobilisiert ist und die genetische Packung 
in situ auf der Matrix amplifiziert wird. 

10 

57. Verfahren nach Anspruch 1, wobei das Zielmaterial auf einer Matrix immobilisiert wird und die Packung durch 
Elution nach chemischem oder enzymatischem Abbau der Bindung, die das Ziel an der Matrix halt, zuruckgewon- 
nen wird. 

is 58. Virus nach Anspruch 30, wobei das Virus auf seiner auGeren Oberflache das entsprechende Wildtyp-Hullprotein 
des virus tragt. 

59. Virus nach Anspruch 30, wobei die proteinahnliche Bindedomane im wesentlichen an den Aminoterminus des 
reifen Hullproteines gekoppelt ist. 

20 

60. Protein nach Anspruch 29, wobei die proteinahnliche Bindedomane im wesentlichen an den Aminoterminus des 
reifen Hullproteines gebunden ist 

61. Protein nach Anspruch 29, wobei dasauGere Oberflachenprotein gVlll-(hauptsachIiches Hull)Protein ist. 

25 

62. Virus nach Anspruch 30, wobei das auBere Oberflachenprotein gVMI-(hauptsachIiches Hull) Protein ist. 

63. Protein nach Anspruch 29, wobei das auGere Oberflachenprotein das gill-Protein ist. 
30 64. virus nach Anspruch 30, wobei das auBere Oberflachenprotein das gill-Protein ist. 



Revendications 

1. Procede d'obtention d'un acide nucleique codant pour un domaine de liaison proteique se liant une matiere cible 
predeterminee, autre que le site de combinaison a I'antigene d'un anticorps qui se lie specifiquement audit domaine, 
dans lequel : 

(a) on prepare' une population variable de conditionnements genetiques amplifiables, lesdits conditionnement 
genetiques etant choisis dans Xe groupe constitue par les cellules, les spores et les virus, chacun desdits 
conditionnements g6n6tiques 6tant g6netiquement modifiable et ayant une surface externe incluant une pro- 
teine de surface externe genetiquement determined, chaque conditionnement incluant un premier produit 
d'assemblage d'acide nucleique codant pour une proteine de liaison potentielle chimerique, chacune desdites 
proteines chim^riques comprenant et chacun desdits produits d'assemblage comprenant un ADN codant pour 
(i) un domaine de liaison potentielle qui est un mutant d'un domaine predetermine stable d'une proteine pa- 
rentale predeterminee, autre qu'un anticorps a une seule chaine, comprenant un ou plusieurs residus de 
surface identifiables, et pour lequel on dispose de ou on peut obtenir a la fois une molecule d'affinite et une 
sequence d'acides amines, et (ii) un signal de transport de surface externe pour obtenir I'affichage du domaine 
de liaison potentielle a la surface externe du conditionnement genetique, Pexpression dudit produit d'assem- 
blage aboutissant a I'affichage de ladite proteine de liaison potentielle chimerique et de son domaine de liaison 
potentielle a la surface externe dudit conditionnement genetique ; et ou ladite population vartee de condition- 
nements genetiques presente collectivement plusieurs domaines de liaison potentiels, la differenciation entre 
lesdrts plusieurs domaines de liaison potentiels differents se faisant par la variation au moins partiellement 

ale^toire d'une ou-plusieurs-positions-d^^^ 

obtenir de facon aleatoire en chacune desdites positions un acide amine appartenant a un ensemble prede- 
termine de deux ou plusieurs acides amines, les acides amines dudit ensemble apparaissant dans ladite 
position dans des proportions attendues statistiquement predeterminees, le message genetique encapsule 
par lesdits conditionnements genetiques etant amplifiable in vitro ou par une culture cellulaire desdits condi- 
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tionnements genetiques et separable sur la base du domains de liaison potentielle qui y est affiche, 

(b) on provoque ['expression desdites proteines de liaison potentielle chimeriques et I'affichage desdits do- 
maines de liaison potentielle a la surface exteme desdits conditionnements ; 

(c) on met en contact lesdits conditionnements avec la matiere cible predetermines de maniere que lesdits 
domaines de liaison potentielle et la matiere cible puissent interagir ; 

(d) on separe les conditionnements affichant un domaine de liaison potentielle qui lient la matiere cible d'avec 
les conditionnements qui ne se lient pas ainsi, sur la base de leur aptitude a se lier avec la matiere cible dans 
I'etape (c), et 

(e) on recueille au moins un conditionnement presentant a sa surface externe une proteine de liaison chime- 
rique comprenant un domaine de liaison reussie stable (DLS) qui s'est lie a ladite cible, ledit conditionnement 
comprenant un acide nucleique codant pour ledit domaine de liaison reussie, et on amplifie ledit acide nuclei- 
que codant pour DLS in vivo ou in vitro. 

2. Procede de la revendication 1 dans lequel ladite population de conditionnements genetiques amplifiables se ca- 
racterise par I'affichage d'au moins 10 s mais au plus 10 9 domaines de liaison potentielle differents et/ou (2) de 1 
pour lO^a 1 pour I0 9 des conditionnements de ladite population affichent le meme domaine de liaison potentielle. 

3. Procede de la revendication 1 dans lequel le niveau de variete de la population est choisi de maniere que les 
conditionnements presentant des domaines de liaison potentielle obtenue pardes substitutions uniques d'acides 
amines dans la sequence d'acides amines du domaine de liaison potentielle parental soient presents en quantites 
detectables. 

4. Precede de la revendication 1 dans lequel ledit signal est fourni par un segment de ladite proteine chimerique qui 
est essentiellement identique dans sa sequence d'acides amines a au moins une partie fonctionnelle d'une proteine 
de surface exteme naturelle encodee par ledit conditionnement genetique ou une cellule naturellement infectee 
par ledit conditionnement genetique. 

5. Procede de la revendication 1 dans lequel le domaine de liaison potentielle parental est initialement choisi comme 
eram un domaine qui esi a plus de 60% homologue avec un domaine d'une proteine connue, ce dernier domaine 
ayant un point de fusion d'au moins environ 60° C. 

6. Precede de la revendication 5 dans lequel la proteine de liaison parentale initialement choisie ne se lie pas de 
maniere preferentielle a la cible predeterminee. 

7. Procede de la revendication 1 , ladite matiere cible comprenant une ou plusieurs molecules discretes, ledit domaine 
de liaison potentielle parental eta&t caracterise comme une sequence d'acide amines, dans lequel en outre on 
identifie un ensemble d'interaction d'acides amines qui sont a la surface du domaine de liaison potentielle parental 
et qui peuvent tous simultanement toucher une seule molecule de la matiere cible, et on obtient les domaines de 
liaison potentielle en remplacant un ou plusieurs des acides amines dans ledit ensemble d'interaction par un acide 
amine* different. 

8. Procede de la revendication t dans lequel la matiere cible est un compose organique non-macromoleculaire et 
les domaines de liaison potentielle comprennent plus d'environ 80 residus acides amines. 

9. Procede de fa revendication 1 dans lequel la matiere cible est un compose organique macromoleculaire et les 
domaines de liaison potentielle ont moins d'environ 80 residus acides amines. 

10. Procede de la revendication 1 dans lequel la matiere cible est un mineral insoluble en solution aqueuse. 

1-1— Procede-de-la revendication-Vdans-lequel-la-cfc^ 
en solution aqueuse. 

12. Procede de la revendication 1 dans lequel la cible est un compose organometallique qui est stable en solution 
aqueuse. 
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13. Procede de la revendication 1 dans lequel la matiere cible est une protease generate, dans lequel on fait tout 
d'abord incuber la matiere cible immobilisee avec un inhibiteur irreversible ou covalent pour inactiver la protease 

14. Procede de la revendication 1 dans lequel le conditionnement genetique amplifiabie est une cellule ou un virus 
qu'on peut separer par affinite et conserve une viabilite. 

15. Procede de la revendication 5 dans lequel la proteine de liaison connue est une enzyme, dont I'activite a un effet 
lethal sur le conditionnement genetique amplifiabie, I'hote du conditionnement genetique amplifiabie, ou la cible 
ou la majorite des produits d'assemblage d'acides nucleiques codent sur ['expression d'un analogue de la proteine 
de liaison connue qui n'a pas une telle activrte enzymatique lethale. 

1 6. Procede de la revendication 1 dans lequel la cible contient des groupes ionisabies et I'on choisit le pH des solutions 
de implication visee et le pH des separations par affinite de maniere que tant la proteine de liaison potentielle 
que de la cible restent stables. 

17. Procede de la revendication 1 dans lequel la cible contient des groupes ionisabies, et dans lequel en outre on 
fournit des contre-ions pour reduire la repulsion electrostatique entre la proteine de liaison potentielle et la cible. 

18. Procede de la revendication 1 dans lequel on choisit le domaine de liaison potentielle initial de maniere que, dans 
les conditions d'utilisatidn prevue de la proteine de liaison desiree et dans les conditions de la separation par 
affinite, les domaines de liaison potentiels et la cible ont des charges opposees, ou Tun d'entre eux est neutre. 

19. Procede de la revendication 1 dans lequel le conditionnement genetique amplifiabie est une cellule bacterienne. 

20. Procede de la revendication 1 dans lequel le conditionnement genetique amplifiabie est une spore bacterienne. 

21. Procecle de la revendication 1 dans lequel le conditionnement genetique amplifiabie est un bacteriophage. 

22. Procede de la revendication 21 dans lequel le signal est fourni par la proteine d'enveloppe de M13 ou un de ses 
segments represented un signal de transport de surface externe. 

23. Procede de la revendication 21 dans lequel le signal est fourni par la proteine de gene ill de M13 ou un de ses 
segments representant un signal de transport de surface externe. 

24. Procede de la revendication 1 dans lequel la repartition des nucleotides incorpores a chaque codon varie est 
choisie pour donner des quantites sensiblement egales d'acides amines acides et basiques. 

25. Procede de la revendication 1 , dans lequel I'etape (c) comprend en outre la mise en contact des conditionnements 
avec une seconde matiere et Pisolement des conditionnements qui ne lient pas cette seconde matiere. 

26. Procede de la revendication 1, dans lequel apres Pobtention d'une nouvelle proteine de liaison reconnaissant une 
premiere cible predetenminee, on choisit la nouvelle proteine de liaison comme proteine de liaison parentale po- 
tentielle pour Tisolement d'une proteine derivee qui se lie egalement a une seconde cible predetenminee. 

27. Procede de la revendication 1 dans lequel le domaine de liaison parentale potentielle initialement choisi est choisi 
dans le groupe constitud par (a) les domaines de liaison de I'inhibiteur de trypsine pancreatique bovine, de la 
crambine, de Povomucoide, du lysozyme T4, du lysozyme de blanc d'oeuf de poule, et de I'azurine, et (b) les 
domaines homologues a au moins 50% a I'un quelconque des domaines precedents et qui ont un point de fusion 
d'au moins 60° C. 



28. Procede de la revendication 19 dans lequel le signal de transport de surface externe est fourni par la proteine 
lamB ou un de ses segments representant un signal de transport de surface externe. 

29— Prot£in6-chim«riqu6-comprenant'(i)~au - m 

teux, ledit segment fournissant un signal de transport de surface externe reconnu par une cellule infectee par ledit 
phage de maniere que la proteine chimerique soit assemblee dans I'enveloppe des particules de phage produites 
par ladite cellule, et (ii) un domaine de liaison proteique stable, autre qu'un anticorps a une seule chaine, ledit 
domaine comprenant un ou plusieurs residus de surface identifiables, qui lie une matiere cible pre^eterminee, 
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autre que le site de combinaison antigenique d'un anticorps qui lie specifiquement ledit domaine, la cible etant 
liee de maniere suffisamment forte pour que la constante de dissociation du complexe domaine de liaison.cible 
soit inferieure k 10 -6 moles/litre, et qui est h&erologue audit phage. 

30. Virus portant k sa surface externe une proteine de liaison chimerique, ladite proteine comprenant (i) un domaine 
de liaison proteique, autre qu'un anticorps k une seule chaine, qui est de structure suffisamment stable pour avoir 
un point de fusion d'au moins 40°C, et qui se lie k une cible, autre que le domaine variable d'un anticorps, de 
maniere suffisamment forte pour que la constante de dissociation du complexe domaine de liaison :cible soit infe- 
rieure k 10" 6 moles/litre, et (ii) au moins une partie fonctionnelle d'une proteine d'enveloppe dudit virus, ladite partie 
agissant, lorsque la proteine chimerique est produrte chez une cellule hate appropriee, pour provoquer I'affichage 
de la proteine de liaison chimerique ou d'une de ses formes traitees k la surface externe du virus, ledit domaine 
de liaison etant capable de se lier k une matiere cible k laquelle ladite proteine d'enveloppe ne se lie pas de 
maniere preterentielle, ledit domaine de liaison etant etranger aux proteines d'enveloppe natives dudit virus. 

31. Precede de la revendication 1 dans lequel dans au moins un cas les residus acides amines varies dans un premier 
assortment de domaines de liaison potentiels sont maintenus constants dans I'assortiment suivant de domaines 
de liaison potentiels. 

32. Precede de la revendication 1 dans lequel le precede de preparation d'une population de conditionnements ge- 
netiques varies comprend la preparation d'une population d'ADN varies codant pour un domaine de liaison potentiel 
qui est un mutant d'un domaine predetermine stable d'une proteine parentale predeterminee, autre qu'un anticorps 
& une seule chaine, comprenant un ou plusieurs residus de surface identifiables, et pour lequel on dispose de ou 
on peut obtenir k la fois une molecule d'affinite et une sequence d'acides amines, ou la distribution des nucleotides 
incorpores & chaque codon varie est choisie pour donner des quantites sensiblement egales d'acides amines 
acides et basiques. 

33. Proteine de la revendication 29, la proteine comprenant un premier domaine etranger reconnaissant une premiere 
matiere cible et un second domaine etranger reconnaissant une seconde matiere cible. 

34. Procede de !a revendication 1 dans lequel le domaine de liaison parentale potentielle initialement choisi est ho- 
mologue k au moins 50% au domaine de liaison de I'inhibiteur de trypsine pancreatique bovine. 

35. Procede de la revendication 3 dans lequel la proteine de liaison parentale potentielle initialement choisie a au 
moins un domaine de liaison stable et ledit domaine a un point de fusion d'au moins 60°C et est stable sur un 
intervalle de pH d'au moins 3,0-8,0. 

36. Procede de la revendication 1 9 dans lequel le conditionnement genetique amplifiable est une souche d'Escherichia 
coli . 

37. Procede de la revendication 21 dans lequel le conditionnement genetique amplifiable est un phage filamenteux. 

38. Procede de la revendication 21 dans lequel le conditionnement genetique amplifiable est un derive d'un bacterio- 
phage c£Esc]]enchia coli M13 ou un derive du phage filamenteux Pf1 de Pseudomonas aeruginosa 

39. Procede de la revendication 24 dans lequel la repartition des nucleotides incorpores k chaque codon varie est 
encore choisie pour donner la plus grande valeur pour la quantite ((1.-abondance(codons d'arret)) multiplie par 
(abondance de I'acide amine le moins abondant)/(abondance de I'acide amine le plus abondant). 

40. Proteine chimerique de la revendication 29 dans laquelle ledit domaine etranger se lie k une matiere cible non 
liee de facon preferentielle par ladite proteine de surface externe. 

41. Procede de la revendication 32 dans lequel la repartition des nucleotides incorpores k chaque codon varie est 
encore choisie pour donner la plus grande valeur & la quantite ((1 .-abondance(codcns d'arrSt)) multiplie par (abon- 

dance de-i'acide-amine-ie-moins abondantyfabondance de i'acide amineie'pius abondant}): 

42. Procede de la revendication 1 dans lequel la proteine parentale predeterminee n'est pas associee de facon native 
au conditionnement genetique. 
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43. Procede da la ravendication 1 dans fequel la proteine parentale predetermined n'est pas une proteine de surface 
du conditionnement genetique. 

44. Procede de la revendication 1 dans lequel la proteine parentale predeterminee n'est pas une proteine de surface 
d'une cellule ou d'un virus quelconque. 

45. Procede de la revendication 1 dans lequel la proteine parentale predeterminee n'est pas une proteine bacterienne 
ou virale. 

46. Proc&te de la revendication 1 dans lequel, pour au moins un codon, on obtient un melange desire d'acide amines 
par utilisation d'un melange non equimolaire de nucleotides en synthetisant au moins une position de base de ce 
codon. 

47. Procede de la revendication 1 dans lequel I'affinite du domaine de liaison reussie pour la cible est sensiblement 
plus grande qua I'affinite du domaine de liaison parentale pour la cible. 

48. Procede de la revendication 1 dans lequel la surface externe du conditionnement genetique presente, non seule- 
ment ladite proteine chimerique, mais egalement la proteine de surface externe de type sauvage de meme origine. 

49. Precede de la revendication 48 dans lequel au moins un des genes codant pour ladite proteine chimerique et ladite 
proteine de surface externe de type sauvage est sous la commande d'un promoteur regulable, permettant de 
regler la rapport entre la proteine chimerique et la proteine da type sauvage. 

50. Procede de la revendication 1 dans lequel la proteine de surface externe est une proteine d'enveloppe derivee du 
gene ill d'un phage filamenteux. 

51. Procede* de la revendication 48 dans lequel le domaine de liaison potentielle est lie a une extremity amino ou 
carboxy exposee de la proteine d'enveloppe de type sauvage mature. 

52. Procede de la revendication 1 dans lequel ie site d'insertion pour le domaine de liaison potentielle initial est a la 
limite de domaine d'une proteine d'enveloppe. 

53. Procede de la r ev end i cation 1 d ans lequel le site d'i n ser t ion pour le domaine de liaison potentiel l e ini t ial es t a un 
tournant ou a une boucle d'une proteine d'enveloppe. 

54. Procede de la revendication 1 dans lequel la proteine de surface externe est la proteine de gene VIII de M13. 

55. Procede de la revendication 1 dans lequel on recueille un conditionnement par elution a la suite d'une baisse de 
pH, d'une augmentation de concentration d'un sel ou autre solute qui affaiblit les interactions non covalentes, de 
la temperature, ou de la concentration de matiere cible soluble, ou d'une des ieurs combinaisons. 

56. Procede de la revendication 1 dans lequel la matiere cible est immobilisee sur une matrice et le conditionnement 
genetique est amplrfie* in situ sur la matrice. 

57. Procede de la revendication 1 dans lequel la matiere cible est immobilisee sur une matrice et on recueille un 
conditionnement par elution apres degradation chimique ou enzymatique de la liaison retenant la cible a la matrice. 

58. Virus de la revendication 30, ledit virus portant en outre a sa surface externe la proteine d'enveloppe de type 
sauvage correspondante dudit virus. 

59. Virus de la revendication 30 dans lequel ledit domaine de liaison proteique est couple essentieilement a I'extremite 
amino de la proteine d'enveloppe mature. 

60. _Prot6ine-de_la-revendication-29-c^ns-laque!le-le-doma!ne-de liaison-proteique-est-couple-essentiellement a-l'ex-- - 

tremite amino de la proteine d'enveloppe mature. 

61. Proteine de la revendication 29 ou la proteine de surface externe est la proteine gVIII (enveloppe majeure). 
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62. Virus de la revendication 30 ou ia proteins de surface externe est la proteine th gVIII (enveloppe majeure). 

63. Proteine de la revendication 29 ou la proteine de surface externe est la proteine gill. 

64. Virus de la revendication 30 ou la proteine de surface externe est la proteine gill. 
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FIG. 5C. 



134 



